2025-12-04T08:56:28.0387619Z Current runner version: '2.330.0' 2025-12-04T08:56:28.0394665Z Runner name: 'i-02e8ffc45eb447a37' 2025-12-04T08:56:28.0395664Z Runner group name: 'Default' 2025-12-04T08:56:28.0396555Z Machine name: 'ip-10-1-32-85' 2025-12-04T08:56:28.0399848Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T08:56:28.0402396Z Contents: read 2025-12-04T08:56:28.0402984Z Metadata: read 2025-12-04T08:56:28.0403565Z ##[endgroup] 2025-12-04T08:56:28.0405839Z Secret source: Actions 2025-12-04T08:56:28.0406651Z Prepare workflow directory 2025-12-04T08:56:28.0973377Z Prepare all required actions 2025-12-04T08:56:28.1018958Z Getting action download info 2025-12-04T08:56:28.5133234Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T08:56:30.9112833Z Download action repository 'pytorch/pytorch@main' (SHA:eabb7ad2128580ef674446027b95bcf4e21e8df3) 2025-12-04T08:56:46.6027636Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-12-04T08:56:46.9599728Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T08:56:47.2038259Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T08:56:47.4310600Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T08:56:47.7150697Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T08:56:48.0029691Z Getting action download info 2025-12-04T08:56:48.1308047Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T08:56:48.4002235Z Getting action download info 2025-12-04T08:56:48.5146342Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T08:56:48.7366261Z Getting action download info 2025-12-04T08:56:48.9192730Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-12-04T08:56:49.1182538Z Getting action download info 2025-12-04T08:56:49.2617309Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T08:56:49.2621756Z ##[group] Inputs 2025-12-04T08:56:49.2622198Z build-environment: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T08:56:49.2634085Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T08:56:49.2645545Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:56:49.2646435Z sync-tag: 2025-12-04T08:56:49.2647249Z timeout-minutes: 360 2025-12-04T08:56:49.2647529Z use-gha: 2025-12-04T08:56:49.2647758Z dashboard-tag: 2025-12-04T08:56:49.2648006Z s3-bucket: gha-artifacts 2025-12-04T08:56:49.2648303Z aws-role-to-assume: 2025-12-04T08:56:49.2648911Z disable-monitor: false 2025-12-04T08:56:49.2649216Z monitor-log-interval: 5 2025-12-04T08:56:49.2649537Z monitor-data-collect-interval: 1 2025-12-04T08:56:49.2649867Z ##[endgroup] 2025-12-04T08:56:49.2650483Z Complete job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T08:56:49.3368151Z A job started hook has been configured by the self-hosted runner administrator 2025-12-04T08:56:49.3479705Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-12-04T08:56:49.3490204Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:49.3491124Z ##[endgroup] 2025-12-04T08:56:50.8798029Z Runner Type: lf.linux.g4dn.12xlarge.nvidia.gpu 2025-12-04T08:56:50.8798623Z Instance Type: g4dn.12xlarge 2025-12-04T08:56:50.8798942Z AMI Name: unknown 2025-12-04T08:56:50.8828761Z AMI ID: ami-08982f1c5bf93d976 2025-12-04T08:56:56.3776470Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-12-04T08:56:56.3776990Z with: 2025-12-04T08:56:56.3777629Z github-secret: *** 2025-12-04T08:56:56.3778465Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-12-04T08:56:56.3779672Z activate-with-label: false 2025-12-04T08:56:56.3780001Z label: with-ssh 2025-12-04T08:56:56.3780276Z remove-existing-keys: true 2025-12-04T08:56:56.3780602Z fail-silently: true 2025-12-04T08:56:56.3780878Z env: 2025-12-04T08:56:56.3781109Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.3781421Z ##[endgroup] 2025-12-04T08:56:56.5040291Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-12-04T08:56:56.5041593Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-12-04T08:56:56.5187745Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T08:56:56.5188412Z with: 2025-12-04T08:56:56.5188660Z no-sudo: true 2025-12-04T08:56:56.5188935Z submodules: recursive 2025-12-04T08:56:56.5189220Z fetch-depth: 0 2025-12-04T08:56:56.5189482Z env: 2025-12-04T08:56:56.5189726Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.5190030Z ##[endgroup] 2025-12-04T08:56:56.5268275Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:56:56.5269394Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:56:56.5277347Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:56.5277754Z env: 2025-12-04T08:56:56.5278006Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.5278311Z ##[endgroup] 2025-12-04T08:56:56.5363185Z ##[group]Run # Use all available CPUs for fetching 2025-12-04T08:56:56.5363645Z # Use all available CPUs for fetching 2025-12-04T08:56:56.5364014Z cd "${GITHUB_WORKSPACE}" 2025-12-04T08:56:56.5364371Z git config --global fetch.parallel 0 2025-12-04T08:56:56.5364781Z git config --global submodule.fetchJobs 0 2025-12-04T08:56:56.5365141Z  2025-12-04T08:56:56.5365574Z # Clean workspace. The default checkout action should also do this, but 2025-12-04T08:56:56.5366093Z # do it here as well just in case 2025-12-04T08:56:56.5366432Z if [[ -d .git ]]; then 2025-12-04T08:56:56.5366742Z  if [ -z "${NO_SUDO}" ]; then 2025-12-04T08:56:56.5367064Z  sudo git clean -ffdx 2025-12-04T08:56:56.5367362Z  else 2025-12-04T08:56:56.5367607Z  git clean -ffdx 2025-12-04T08:56:56.5367868Z  fi 2025-12-04T08:56:56.5368095Z fi 2025-12-04T08:56:56.5374169Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:56.5374602Z env: 2025-12-04T08:56:56.5374872Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.5375174Z NO_SUDO: true 2025-12-04T08:56:56.5375417Z ##[endgroup] 2025-12-04T08:56:56.5500097Z ##[group]Run actions/checkout@v4 2025-12-04T08:56:56.5500419Z with: 2025-12-04T08:56:56.5500681Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:56:56.5501018Z fetch-depth: 0 2025-12-04T08:56:56.5501265Z submodules: recursive 2025-12-04T08:56:56.5501534Z show-progress: false 2025-12-04T08:56:56.5501800Z repository: pytorch/pytorch 2025-12-04T08:56:56.5502223Z token: *** 2025-12-04T08:56:56.5502459Z ssh-strict: true 2025-12-04T08:56:56.5502703Z ssh-user: git 2025-12-04T08:56:56.5502940Z persist-credentials: true 2025-12-04T08:56:56.5503222Z clean: true 2025-12-04T08:56:56.5503476Z sparse-checkout-cone-mode: true 2025-12-04T08:56:56.5503772Z fetch-tags: false 2025-12-04T08:56:56.5504012Z lfs: false 2025-12-04T08:56:56.5504244Z set-safe-directory: true 2025-12-04T08:56:56.5504505Z env: 2025-12-04T08:56:56.5504722Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.5504997Z ##[endgroup] 2025-12-04T08:56:56.6706065Z Syncing repository: pytorch/pytorch 2025-12-04T08:56:56.6707607Z ##[group]Getting Git version info 2025-12-04T08:56:56.6708128Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T08:56:56.6709123Z [command]/usr/bin/git version 2025-12-04T08:56:56.6709437Z git version 2.50.1 2025-12-04T08:56:56.6722367Z ##[endgroup] 2025-12-04T08:56:56.6732914Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/7921fbed-a0a5-4e55-9093-91d280a939ee/.gitconfig' 2025-12-04T08:56:56.6756954Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/7921fbed-a0a5-4e55-9093-91d280a939ee' before making global git config changes 2025-12-04T08:56:56.6758134Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T08:56:56.6761096Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T08:56:56.6793449Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T08:56:56.6796932Z ##[group]Initializing the repository 2025-12-04T08:56:56.6801649Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T08:56:56.6836747Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-12-04T08:56:56.6837827Z hint: is subject to change. To configure the initial branch name to use in all 2025-12-04T08:56:56.6838584Z hint: of your new repositories, which will suppress this warning, call: 2025-12-04T08:56:56.6839064Z hint: 2025-12-04T08:56:56.6839410Z hint: git config --global init.defaultBranch 2025-12-04T08:56:56.6839797Z hint: 2025-12-04T08:56:56.6840180Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-12-04T08:56:56.6840847Z hint: 'development'. The just-created branch can be renamed via this command: 2025-12-04T08:56:56.6841333Z hint: 2025-12-04T08:56:56.6841586Z hint: git branch -m 2025-12-04T08:56:56.6841879Z hint: 2025-12-04T08:56:56.6842281Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-12-04T08:56:56.6843080Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-12-04T08:56:56.6845838Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-12-04T08:56:56.6874430Z ##[endgroup] 2025-12-04T08:56:56.6874982Z ##[group]Disabling automatic garbage collection 2025-12-04T08:56:56.6876755Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T08:56:56.6904130Z ##[endgroup] 2025-12-04T08:56:56.6904597Z ##[group]Setting up auth 2025-12-04T08:56:56.6910466Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T08:56:56.6937535Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T08:56:56.7258556Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T08:56:56.7283709Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T08:56:56.7581259Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:56:56.7607370Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T08:56:56.7896984Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:56:56.7959090Z ##[endgroup] 2025-12-04T08:56:56.7959703Z ##[group]Fetching the repository 2025-12-04T08:56:56.7966089Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T08:57:43.5017006Z From https://github.com/pytorch/pytorch 2025-12-04T08:57:43.5017600Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T08:57:43.5018302Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T08:57:43.5019028Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T08:57:43.5019797Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T08:57:43.5020500Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T08:57:43.5021180Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T08:57:43.5022552Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T08:57:43.5024789Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T08:57:43.5025714Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T08:57:43.5026997Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T08:57:43.5028173Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T08:57:43.5029219Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T08:57:43.5030336Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T08:57:43.5031542Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T08:57:43.5032556Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T08:57:43.5034097Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T08:57:43.5035235Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T08:57:43.5036765Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T08:57:43.5037829Z * [new branch] adi/test -> origin/adi/test 2025-12-04T08:57:43.5038946Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T08:57:43.5040079Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T08:57:43.5041176Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T08:57:43.5042292Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T08:57:43.5043423Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T08:57:43.5044448Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T08:57:43.5045942Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T08:57:43.5047959Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T08:57:43.5049053Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T08:57:43.5050536Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T08:57:43.5051477Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T08:57:43.5053198Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T08:57:43.5054688Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T08:57:43.5055680Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T08:57:43.5056953Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T08:57:43.5057913Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T08:57:43.5059040Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T08:57:43.5060057Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T08:57:43.5061736Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T08:57:43.5063169Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T08:57:43.5064297Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T08:57:43.5065598Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T08:57:43.5066799Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T08:57:43.5068128Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T08:57:43.5069112Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T08:57:43.5070246Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T08:57:43.5071368Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T08:57:43.5072578Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T08:57:43.5073670Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T08:57:43.5074781Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T08:57:43.5075939Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T08:57:43.5077044Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T08:57:43.5078153Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T08:57:43.5079868Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T08:57:43.5080948Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T08:57:43.5082045Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T08:57:43.5084201Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T08:57:43.5085214Z * [new branch] async_tp -> origin/async_tp 2025-12-04T08:57:43.5086850Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T08:57:43.5088081Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T08:57:43.5089223Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T08:57:43.5090466Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T08:57:43.5091794Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T08:57:43.5093066Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T08:57:43.5094668Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T08:57:43.5095839Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T08:57:43.5097084Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T08:57:43.5098237Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T08:57:43.5099392Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T08:57:43.5100659Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T08:57:43.5101938Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T08:57:43.5103501Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T08:57:43.5104570Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T08:57:43.5105775Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T08:57:43.5106936Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T08:57:43.5108610Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T08:57:43.5110063Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T08:57:43.5111055Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T08:57:43.5112344Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T08:57:43.5113326Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T08:57:43.5114911Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T08:57:43.5116480Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T08:57:43.5117985Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T08:57:43.5119006Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T08:57:43.5120067Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T08:57:43.5121118Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T08:57:43.5122274Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T08:57:43.5123304Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T08:57:43.5124364Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T08:57:43.5126098Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T08:57:43.5127513Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T08:57:43.5128521Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T08:57:43.5129455Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T08:57:43.5130611Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T08:57:43.5131661Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T08:57:43.5132767Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T08:57:43.5134544Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T08:57:43.5135569Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T08:57:43.5136727Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T08:57:43.5137931Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T08:57:43.5139012Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T08:57:43.5140120Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T08:57:43.5141361Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T08:57:43.5142558Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T08:57:43.5143645Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T08:57:43.5144784Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T08:57:43.5146001Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T08:57:43.5147093Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T08:57:43.5148135Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T08:57:43.5149488Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T08:57:43.5150503Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T08:57:43.5151620Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T08:57:43.5152867Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T08:57:43.5153846Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T08:57:43.5154870Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T08:57:43.5155977Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T08:57:43.5157057Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T08:57:43.5158091Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T08:57:43.5159135Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T08:57:43.5160760Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T08:57:43.5161835Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T08:57:43.5162988Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T08:57:43.5163962Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T08:57:43.5165281Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T08:57:43.5166280Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T08:57:43.5167377Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T08:57:43.5169077Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T08:57:43.5170228Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T08:57:43.5171552Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5172711Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5174290Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5175466Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5176646Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5177975Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5179353Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5180651Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5181985Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5183242Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5184434Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5185633Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5186801Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5188012Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5189352Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5190387Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5191678Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5192836Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5194040Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5195011Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T08:57:43.5196185Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T08:57:43.5197391Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T08:57:43.5198529Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T08:57:43.5199625Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T08:57:43.5200753Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T08:57:43.5201851Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T08:57:43.5202973Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T08:57:43.5204936Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T08:57:43.5205914Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T08:57:43.5231937Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T08:57:43.5233341Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T08:57:43.5234246Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T08:57:43.5234853Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T08:57:43.5235418Z * [new branch] context_test -> origin/context_test 2025-12-04T08:57:43.5236166Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T08:57:43.5236961Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T08:57:43.5237659Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T08:57:43.5238367Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T08:57:43.5239018Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T08:57:43.5239697Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T08:57:43.5240320Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T08:57:43.5240945Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T08:57:43.5241540Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T08:57:43.5242106Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T08:57:43.5242725Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T08:57:43.5243326Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T08:57:43.5243937Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T08:57:43.5244720Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T08:57:43.5245410Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T08:57:43.5246014Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T08:57:43.5246639Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T08:57:43.5247255Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T08:57:43.5247922Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T08:57:43.5248667Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T08:57:43.5249411Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T08:57:43.5250056Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T08:57:43.5250674Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T08:57:43.5251236Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T08:57:43.5251863Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T08:57:43.5252491Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T08:57:43.5253280Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T08:57:43.5254267Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T08:57:43.5255131Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T08:57:43.5255885Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T08:57:43.5256528Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T08:57:43.5257165Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T08:57:43.5257745Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T08:57:43.5258302Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T08:57:43.5258905Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T08:57:43.5259586Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T08:57:43.5260209Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T08:57:43.5260775Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T08:57:43.5261373Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T08:57:43.5264017Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T08:57:43.5265704Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T08:57:43.5266817Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T08:57:43.5267735Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T08:57:43.5269496Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T08:57:43.5271287Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T08:57:43.5272518Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T08:57:43.5274085Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T08:57:43.5275141Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T08:57:43.5276369Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T08:57:43.5278116Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T08:57:43.5279968Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T08:57:43.5281624Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T08:57:43.5283242Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T08:57:43.5285054Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T08:57:43.5286424Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T08:57:43.5287826Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T08:57:43.5289125Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T08:57:43.5290234Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T08:57:43.5291696Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T08:57:43.5292653Z * [new branch] docs -> origin/docs 2025-12-04T08:57:43.5294223Z * [new branch] documentation -> origin/documentation 2025-12-04T08:57:43.5295357Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T08:57:43.5297088Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T08:57:43.5298072Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T08:57:43.5299095Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T08:57:43.5300241Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T08:57:43.5301552Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T08:57:43.5302731Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T08:57:43.5303921Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T08:57:43.5305199Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T08:57:43.5306321Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T08:57:43.5307901Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T08:57:43.5309113Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T08:57:43.5310115Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T08:57:43.5311273Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T08:57:43.5312458Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T08:57:43.5313792Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T08:57:43.5315189Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T08:57:43.5316158Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T08:57:43.5317391Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T08:57:43.5318893Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T08:57:43.5319728Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T08:57:43.5320875Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T08:57:43.5321779Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T08:57:43.5322886Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T08:57:43.5324185Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T08:57:43.5325263Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T08:57:43.5326396Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T08:57:43.5327535Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T08:57:43.5328705Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T08:57:43.5329858Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T08:57:43.5330811Z * [new branch] exec -> origin/exec 2025-12-04T08:57:43.5332236Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T08:57:43.5333588Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T08:57:43.5334925Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T08:57:43.5336177Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T08:57:43.5337290Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T08:57:43.5338430Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T08:57:43.5339581Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T08:57:43.5341078Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T08:57:43.5342190Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T08:57:43.5343890Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T08:57:43.5345004Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T08:57:43.5346411Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T08:57:43.5347486Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T08:57:43.5348641Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T08:57:43.5349764Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T08:57:43.5350883Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T08:57:43.5351938Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T08:57:43.5353087Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T08:57:43.5354138Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T08:57:43.5355248Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T08:57:43.5356925Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T08:57:43.5358035Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T08:57:43.5359315Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T08:57:43.5360288Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T08:57:43.5361493Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T08:57:43.5362613Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T08:57:43.5363922Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T08:57:43.5365096Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T08:57:43.5366272Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T08:57:43.5367567Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T08:57:43.5368698Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T08:57:43.5370139Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T08:57:43.5371091Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T08:57:43.5372190Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T08:57:43.5373337Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T08:57:43.5375262Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T08:57:43.5376262Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T08:57:43.5377973Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T08:57:43.5379378Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T08:57:43.5381083Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T08:57:43.5382386Z * [new branch] fca -> origin/fca 2025-12-04T08:57:43.5383477Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T08:57:43.5384621Z * [new branch] fca5 -> origin/fca5 2025-12-04T08:57:43.5386335Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T08:57:43.5387434Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T08:57:43.5389093Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T08:57:43.5390746Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T08:57:43.5392298Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T08:57:43.5393354Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T08:57:43.5394445Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T08:57:43.5395474Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T08:57:43.5396508Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T08:57:43.5397570Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T08:57:43.5398576Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T08:57:43.5399592Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T08:57:43.5400810Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T08:57:43.5401884Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T08:57:43.5403005Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T08:57:43.5404520Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T08:57:43.5405494Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T08:57:43.5406553Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T08:57:43.5407667Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T08:57:43.5408728Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T08:57:43.5409760Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T08:57:43.5410908Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T08:57:43.5412076Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T08:57:43.5413250Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T08:57:43.5414639Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T08:57:43.5415718Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T08:57:43.5417023Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T08:57:43.5418038Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T08:57:43.5419850Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T08:57:43.5420944Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T08:57:43.5422031Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T08:57:43.5423150Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T08:57:43.5424298Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T08:57:43.5426087Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T08:57:43.5427353Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T08:57:43.5429174Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T08:57:43.5430469Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T08:57:43.5432870Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T08:57:43.5433907Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T08:57:43.5435910Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T08:57:43.5437013Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T08:57:43.5439268Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T08:57:43.5440341Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T08:57:43.5442224Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T08:57:43.5443295Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T08:57:43.5444419Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T08:57:43.5445972Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T08:57:43.5447032Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T08:57:43.5448137Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T08:57:43.5449713Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T08:57:43.5450827Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T08:57:43.5451802Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T08:57:43.5453544Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T08:57:43.5454730Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T08:57:43.5455867Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T08:57:43.5457442Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T08:57:43.5458892Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T08:57:43.5459991Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T08:57:43.5461548Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T08:57:43.5462645Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T08:57:43.5463802Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T08:57:43.5465939Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T08:57:43.5467418Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T08:57:43.5468439Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T08:57:43.5470140Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T08:57:43.5471225Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T08:57:43.5472471Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T08:57:43.5474143Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T08:57:43.5475173Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T08:57:43.5476268Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T08:57:43.5477857Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T08:57:43.5479405Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T08:57:43.5480615Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T08:57:43.5482274Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T08:57:43.5483296Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T08:57:43.5484404Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T08:57:43.5486078Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T08:57:43.5487158Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T08:57:43.5488315Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T08:57:43.5489950Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T08:57:43.5490950Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T08:57:43.5492170Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T08:57:43.5494035Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T08:57:43.5495104Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T08:57:43.5496431Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T08:57:43.5497850Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T08:57:43.5498983Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T08:57:43.5500119Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T08:57:43.5501560Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T08:57:43.5502645Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T08:57:43.5503794Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T08:57:43.5505678Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T08:57:43.5506701Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T08:57:43.5507795Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T08:57:43.5509378Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T08:57:43.5510526Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T08:57:43.5511672Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T08:57:43.5513193Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T08:57:43.5514230Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T08:57:43.5515366Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T08:57:43.5517021Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T08:57:43.5518135Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T08:57:43.5519226Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T08:57:43.5520786Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T08:57:43.5522018Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T08:57:43.5523070Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T08:57:43.5524869Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T08:57:43.5525923Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T08:57:43.5527035Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T08:57:43.5528845Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T08:57:43.5529922Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T08:57:43.5531099Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T08:57:43.5532730Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T08:57:43.5534245Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T08:57:43.5535429Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T08:57:43.5537101Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T08:57:43.5538122Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T08:57:43.5539415Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T08:57:43.5541284Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T08:57:43.5542348Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T08:57:43.5543568Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T08:57:43.5545487Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T08:57:43.5546624Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T08:57:43.5547751Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T08:57:43.5549579Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T08:57:43.5550638Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T08:57:43.5551736Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T08:57:43.5553448Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T08:57:43.5554508Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T08:57:43.5555668Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T08:57:43.5557305Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T08:57:43.5558353Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T08:57:43.5559526Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T08:57:43.5561387Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T08:57:43.5562565Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T08:57:43.5563949Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T08:57:43.5564936Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T08:57:43.5566649Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T08:57:43.5567787Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T08:57:43.5569326Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T08:57:43.5570368Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T08:57:43.5571486Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T08:57:43.5573564Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T08:57:43.5574752Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T08:57:43.5575897Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T08:57:43.5577504Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T08:57:43.5578575Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T08:57:43.5580019Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T08:57:43.5581654Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T08:57:43.5582786Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T08:57:43.5583901Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T08:57:43.5585508Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T08:57:43.5586576Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T08:57:43.5587829Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T08:57:43.5589097Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T08:57:43.5590248Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T08:57:43.5591480Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T08:57:43.5593027Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T08:57:43.5594037Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T08:57:43.5595117Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T08:57:43.5596606Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T08:57:43.5597633Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T08:57:43.5598769Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T08:57:43.5600264Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T08:57:43.5601185Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T08:57:43.5602314Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T08:57:43.5603796Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T08:57:43.5605221Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T08:57:43.5606287Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T08:57:43.5607417Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T08:57:43.5608913Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T08:57:43.5609832Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T08:57:43.5610933Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T08:57:43.5612463Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T08:57:43.5613598Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T08:57:43.5614796Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T08:57:43.5616319Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T08:57:43.5617299Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T08:57:43.5618415Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T08:57:43.5620277Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T08:57:43.5621454Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T08:57:43.5622655Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T08:57:43.5624263Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T08:57:43.5625539Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T08:57:43.5626655Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T08:57:43.5628446Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T08:57:43.5629436Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T08:57:43.5630654Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T08:57:43.5632771Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T08:57:43.5634034Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T08:57:43.5634769Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T08:57:43.5635772Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T08:57:43.5636850Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T08:57:43.5637932Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T08:57:43.5639461Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T08:57:43.5640543Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T08:57:43.5642180Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T08:57:43.5643196Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T08:57:43.5644285Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T08:57:43.5645698Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T08:57:43.5646798Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T08:57:43.5648446Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T08:57:43.5649431Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T08:57:43.5650492Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T08:57:43.5652080Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T08:57:43.5653226Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T08:57:43.5654649Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T08:57:43.5656239Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T08:57:43.5657435Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T08:57:43.5658624Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T08:57:43.5660031Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T08:57:43.5661077Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T08:57:43.5662240Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T08:57:43.5664097Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T08:57:43.5665226Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T08:57:43.5667347Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T08:57:43.5668367Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T08:57:43.5669978Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T08:57:43.5671285Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T08:57:43.5672523Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T08:57:43.5673929Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T08:57:43.5674985Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T08:57:43.5676126Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T08:57:43.5677639Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T08:57:43.5678517Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T08:57:43.5680029Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T08:57:43.5681671Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T08:57:43.5682735Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T08:57:43.5683871Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T08:57:43.5685482Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T08:57:43.5686538Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T08:57:43.5687691Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T08:57:43.5689418Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T08:57:43.5690544Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T08:57:43.5691828Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T08:57:43.5693314Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T08:57:43.5694692Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T08:57:43.5695842Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T08:57:43.5697307Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T08:57:43.5698305Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T08:57:43.5699791Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T08:57:43.5700954Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T08:57:43.5701989Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T08:57:43.5703826Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T08:57:43.5705015Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T08:57:43.5706175Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T08:57:43.5707558Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T08:57:43.5708564Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T08:57:43.5709985Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T08:57:43.5710943Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T08:57:43.5712357Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T08:57:43.5713367Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T08:57:43.5715604Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T08:57:43.5717018Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T08:57:43.5718323Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T08:57:43.5719989Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T08:57:43.5721782Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T08:57:43.5722961Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T08:57:43.5724390Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T08:57:43.5725418Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T08:57:43.5726824Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T08:57:43.5727923Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T08:57:43.5729352Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T08:57:43.5730361Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T08:57:43.5731497Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T08:57:43.5733719Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T08:57:43.5734840Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T08:57:43.5736001Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T08:57:43.5737583Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T08:57:43.5738860Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T08:57:43.5740014Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T08:57:43.5741706Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T08:57:43.5742733Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T08:57:43.5743857Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T08:57:43.5745695Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T08:57:43.5746702Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T08:57:43.5747806Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T08:57:43.5749186Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T08:57:43.5750215Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T08:57:43.5751323Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T08:57:43.5752970Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T08:57:43.5754052Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T08:57:43.5755151Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T08:57:43.5756598Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T08:57:43.5757669Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T08:57:43.5758800Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T08:57:43.5760341Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T08:57:43.5761419Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T08:57:43.5762504Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T08:57:43.5764032Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T08:57:43.5765117Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T08:57:43.5766315Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T08:57:43.5768294Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T08:57:43.5769298Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T08:57:43.5770383Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T08:57:43.5771985Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T08:57:43.5773146Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T08:57:43.5774710Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T08:57:43.5776155Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T08:57:43.5777261Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T08:57:43.5778526Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T08:57:43.5780336Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T08:57:43.5781360Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T08:57:43.5782526Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T08:57:43.5784158Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T08:57:43.5785254Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T08:57:43.5786373Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T08:57:43.5787853Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T08:57:43.5788934Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T08:57:43.5790087Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T08:57:43.5791701Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T08:57:43.5792801Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T08:57:43.5793926Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T08:57:43.5795429Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T08:57:43.5796469Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T08:57:43.5797572Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T08:57:43.5799130Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T08:57:43.5800197Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T08:57:43.5801304Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T08:57:43.5802815Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T08:57:43.5803924Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T08:57:43.5805039Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T08:57:43.5806573Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T08:57:43.5807604Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T08:57:43.5808686Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T08:57:43.5810198Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T08:57:43.5811293Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T08:57:43.5812553Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T08:57:43.5814417Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T08:57:43.5815396Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T08:57:43.5816964Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T08:57:43.5818023Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T08:57:43.5819147Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T08:57:43.5820786Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T08:57:43.5821843Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T08:57:43.5822960Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T08:57:43.5824556Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T08:57:43.5825733Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T08:57:43.5826854Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T08:57:43.5828380Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T08:57:43.5829494Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T08:57:43.5830502Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T08:57:43.5832482Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T08:57:43.5833582Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T08:57:43.5834691Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T08:57:43.5836220Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T08:57:43.5837290Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T08:57:43.5838407Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T08:57:43.5839964Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T08:57:43.5840995Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T08:57:43.5842078Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T08:57:43.5843592Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T08:57:43.5844685Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T08:57:43.5845800Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T08:57:43.5847407Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T08:57:43.5848503Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T08:57:43.5849620Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T08:57:43.5851412Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T08:57:43.5852468Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T08:57:43.5853986Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T08:57:43.5855536Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T08:57:43.5856751Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T08:57:43.5858185Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T08:57:43.5859145Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T08:57:43.5860775Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T08:57:43.5861800Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T08:57:43.5863371Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T08:57:43.5864660Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T08:57:43.5866119Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T08:57:43.5867161Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T08:57:43.5868661Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T08:57:43.5869898Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T08:57:43.5871319Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T08:57:43.5872302Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T08:57:43.5873398Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T08:57:43.5875215Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T08:57:43.5876354Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T08:57:43.5877752Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T08:57:43.5879237Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T08:57:43.5880954Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T08:57:43.5882007Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T08:57:43.5883136Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T08:57:43.5884930Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T08:57:43.5885992Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T08:57:43.5887119Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T08:57:43.5889056Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T08:57:43.5890541Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T08:57:43.5891805Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T08:57:43.5892967Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T08:57:43.5894940Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T08:57:43.5896007Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T08:57:43.5897184Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T08:57:43.5898790Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T08:57:43.5899875Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T08:57:43.5901013Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T08:57:43.5903035Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T08:57:43.5903973Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T08:57:43.5905121Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T08:57:43.5907276Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T08:57:43.5908511Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T08:57:43.5909651Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T08:57:43.5911358Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T08:57:43.5912544Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T08:57:43.5913723Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T08:57:43.5915595Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T08:57:43.5916842Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T08:57:43.5918471Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T08:57:43.5919696Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T08:57:43.5921277Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T08:57:43.5922426Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T08:57:43.5923543Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T08:57:43.5925300Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T08:57:43.5926321Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T08:57:43.5927453Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T08:57:43.5929148Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T08:57:43.5930262Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T08:57:43.5931372Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T08:57:43.5933239Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T08:57:43.5934660Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T08:57:43.5936186Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T08:57:43.5937331Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T08:57:43.5938482Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T08:57:43.5940194Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T08:57:43.5941303Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T08:57:43.5942385Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T08:57:43.5944133Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T08:57:43.5945139Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T08:57:43.5946394Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T08:57:43.5948090Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T08:57:43.5949125Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T08:57:43.5950384Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T08:57:43.5951912Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T08:57:43.5953016Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T08:57:43.5954061Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T08:57:43.5955660Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T08:57:43.5956722Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T08:57:43.5957853Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T08:57:43.5959666Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T08:57:43.5960893Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T08:57:43.5962284Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T08:57:43.5963629Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T08:57:43.5964689Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T08:57:43.5965807Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T08:57:43.5967591Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T08:57:43.5968899Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T08:57:43.5969941Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T08:57:43.5971735Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T08:57:43.5972875Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T08:57:43.5974409Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T08:57:43.5975974Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T08:57:43.5977053Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T08:57:43.5978222Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T08:57:43.5982657Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T08:57:43.5983732Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T08:57:43.5985142Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T08:57:43.5986597Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T08:57:43.5987640Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T08:57:43.5988793Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T08:57:43.5990421Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T08:57:43.5991495Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T08:57:43.5992597Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T08:57:43.5994261Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T08:57:43.5995407Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T08:57:43.5996556Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T08:57:43.5998485Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T08:57:43.5999691Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T08:57:43.6000699Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T08:57:43.6002264Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T08:57:43.6003304Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T08:57:43.6004402Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T08:57:43.6005994Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T08:57:43.6007012Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T08:57:43.6008091Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T08:57:43.6009821Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T08:57:43.6011005Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T08:57:43.6012161Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T08:57:43.6014551Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T08:57:43.6015651Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T08:57:43.6016772Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T08:57:43.6018497Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T08:57:43.6019585Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T08:57:43.6020740Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T08:57:43.6022385Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T08:57:43.6023545Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T08:57:43.6024678Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T08:57:43.6026484Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T08:57:43.6027526Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T08:57:43.6028641Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T08:57:43.6030289Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T08:57:43.6031278Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T08:57:43.6032376Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T08:57:43.6034037Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T08:57:43.6035023Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T08:57:43.6036091Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T08:57:43.6037617Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T08:57:43.6038696Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T08:57:43.6039840Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T08:57:43.6041435Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T08:57:43.6042526Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T08:57:43.6043618Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T08:57:43.6045328Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T08:57:43.6046293Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T08:57:43.6047431Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T08:57:43.6049127Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T08:57:43.6050223Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T08:57:43.6051336Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T08:57:43.6052891Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T08:57:43.6054418Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T08:57:43.6055540Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T08:57:43.6057181Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T08:57:43.6058391Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T08:57:43.6059653Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T08:57:43.6061227Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T08:57:43.6062281Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T08:57:43.6063423Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T08:57:43.6065008Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T08:57:43.6066176Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T08:57:43.6067303Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T08:57:43.6068859Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T08:57:43.6069964Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T08:57:43.6071151Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T08:57:43.6072769Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T08:57:43.6073922Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T08:57:43.6075275Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T08:57:43.6077333Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T08:57:43.6078373Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T08:57:43.6079997Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T08:57:43.6082247Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T08:57:43.6083385Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T08:57:43.6084543Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T08:57:43.6086169Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T08:57:43.6087239Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T08:57:43.6088393Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T08:57:43.6089974Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T08:57:43.6091312Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T08:57:43.6092415Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T08:57:43.6094696Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T08:57:43.6095418Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T08:57:43.6096562Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T08:57:43.6098190Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T08:57:43.6099324Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T08:57:43.6100499Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T08:57:43.6102138Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T08:57:43.6103214Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T08:57:43.6104343Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T08:57:43.6106092Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T08:57:43.6107186Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T08:57:43.6108289Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T08:57:43.6109904Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T08:57:43.6110951Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T08:57:43.6112022Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T08:57:43.6113566Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T08:57:43.6114612Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T08:57:43.6115732Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T08:57:43.6117343Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T08:57:43.6118453Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T08:57:43.6120517Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T08:57:43.6122387Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T08:57:43.6123336Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T08:57:43.6124459Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T08:57:43.6126137Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T08:57:43.6127193Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T08:57:43.6128320Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T08:57:43.6129971Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T08:57:43.6131053Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T08:57:43.6132184Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T08:57:43.6134132Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T08:57:43.6135317Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T08:57:43.6136512Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T08:57:43.6138052Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T08:57:43.6139255Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T08:57:43.6140420Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T08:57:43.6142097Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T08:57:43.6143234Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T08:57:43.6144594Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T08:57:43.6146352Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T08:57:43.6147397Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T08:57:43.6148511Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T08:57:43.6150134Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T08:57:43.6151184Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T08:57:43.6152259Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T08:57:43.6154227Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T08:57:43.6155629Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T08:57:43.6156735Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T08:57:43.6158372Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T08:57:43.6159499Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T08:57:43.6160580Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T08:57:43.6162573Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T08:57:43.6163668Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T08:57:43.6164842Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T08:57:43.6166287Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T08:57:43.6167307Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T08:57:43.6168412Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T08:57:43.6170016Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T08:57:43.6171155Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T08:57:43.6172411Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T08:57:43.6174307Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T08:57:43.6175457Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T08:57:43.6176568Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T08:57:43.6178174Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T08:57:43.6179569Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T08:57:43.6180791Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T08:57:43.6182597Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T08:57:43.6183628Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T08:57:43.6184781Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T08:57:43.6186853Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T08:57:43.6187930Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T08:57:43.6189067Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T08:57:43.6191232Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T08:57:43.6192314Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T08:57:43.6193739Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T08:57:43.6194746Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T08:57:43.6196088Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T08:57:43.6197072Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T08:57:43.6198471Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T08:57:43.6199426Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T08:57:43.6201752Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T08:57:43.6202540Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T08:57:43.6204213Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T08:57:43.6205276Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T08:57:43.6206871Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T08:57:43.6207979Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T08:57:43.6209350Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T08:57:43.6210495Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T08:57:43.6211537Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T08:57:43.6213135Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T08:57:43.6214456Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T08:57:43.6215700Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T08:57:43.6217479Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T08:57:43.6218536Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T08:57:43.6219606Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T08:57:43.6221449Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T08:57:43.6222471Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T08:57:43.6223607Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T08:57:43.6225202Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T08:57:43.6226441Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T08:57:43.6227583Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T08:57:43.6229218Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T08:57:43.6230246Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T08:57:43.6231435Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T08:57:43.6233038Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T08:57:43.6234166Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T08:57:43.6235299Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T08:57:43.6236824Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T08:57:43.6237951Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T08:57:43.6239036Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T08:57:43.6240934Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T08:57:43.6242005Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T08:57:43.6243795Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T08:57:43.6244976Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T08:57:43.6246183Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T08:57:43.6247787Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T08:57:43.6248854Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T08:57:43.6249986Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T08:57:43.6270335Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T08:57:43.6271174Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T08:57:43.6271814Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T08:57:43.6272423Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T08:57:43.6273044Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T08:57:43.6273665Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T08:57:43.6274271Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T08:57:43.6274889Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T08:57:43.6275511Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T08:57:43.6276132Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T08:57:43.6276740Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T08:57:43.6277363Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T08:57:43.6277982Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T08:57:43.6278767Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T08:57:43.6279566Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T08:57:43.6280211Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T08:57:43.6280847Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T08:57:43.6281482Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T08:57:43.6282270Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T08:57:43.6282974Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T08:57:43.6283613Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T08:57:43.6284242Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T08:57:43.6284880Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T08:57:43.6285514Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T08:57:43.6286143Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T08:57:43.6286772Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T08:57:43.6287409Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T08:57:43.6288362Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T08:57:43.6289224Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T08:57:43.6290652Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T08:57:43.6291829Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T08:57:43.6293226Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T08:57:43.6294629Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T08:57:43.6295739Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T08:57:43.6296796Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T08:57:43.6298236Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T08:57:43.6299622Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T08:57:43.6300478Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T08:57:43.6302298Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T08:57:43.6303429Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T08:57:43.6304498Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T08:57:43.6306065Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T08:57:43.6307248Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T08:57:43.6308291Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T08:57:43.6309792Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T08:57:43.6310881Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T08:57:43.6311886Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T08:57:43.6313715Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T08:57:43.6314854Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T08:57:43.6316234Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T08:57:43.6317284Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T08:57:43.6318273Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T08:57:43.6320422Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T08:57:43.6321629Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T08:57:43.6322776Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T08:57:43.6324332Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T08:57:43.6325384Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T08:57:43.6326494Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T08:57:43.6328186Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T08:57:43.6329182Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T08:57:43.6330221Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T08:57:43.6331982Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T08:57:43.6333219Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T08:57:43.6334634Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T08:57:43.6336277Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T08:57:43.6337344Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T08:57:43.6338563Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T08:57:43.6340428Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T08:57:43.6341665Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T08:57:43.6342806Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T08:57:43.6344426Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T08:57:43.6345722Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T08:57:43.6346804Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T08:57:43.6348506Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T08:57:43.6350028Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T08:57:43.6351093Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T08:57:43.6352809Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T08:57:43.6353908Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T08:57:43.6355263Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T08:57:43.6356872Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T08:57:43.6358019Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T08:57:43.6359187Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T08:57:43.6360681Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T08:57:43.6361820Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T08:57:43.6362962Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T08:57:43.6364334Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T08:57:43.6365554Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T08:57:43.6366680Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T08:57:43.6368750Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T08:57:43.6369855Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T08:57:43.6371014Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T08:57:43.6372532Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T08:57:43.6373863Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T08:57:43.6375075Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T08:57:43.6376612Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T08:57:43.6377693Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T08:57:43.6378999Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T08:57:43.6380677Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T08:57:43.6381715Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T08:57:43.6382877Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T08:57:43.6384437Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T08:57:43.6385610Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T08:57:43.6386782Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T08:57:43.6388828Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T08:57:43.6389893Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T08:57:43.6391181Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T08:57:43.6392684Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T08:57:43.6393727Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T08:57:43.6394884Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T08:57:43.6396642Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T08:57:43.6397726Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T08:57:43.6398816Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T08:57:43.6400349Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T08:57:43.6401396Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T08:57:43.6402497Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T08:57:43.6404039Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T08:57:43.6405054Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T08:57:43.6406224Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T08:57:43.6407724Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T08:57:43.6408809Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T08:57:43.6409884Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T08:57:43.6411461Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T08:57:43.6412638Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T08:57:43.6414084Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T08:57:43.6415785Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T08:57:43.6416786Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T08:57:43.6417901Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T08:57:43.6419523Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T08:57:43.6420512Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T08:57:43.6421657Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T08:57:43.6423298Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T08:57:43.6424700Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T08:57:43.6425880Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T08:57:43.6427426Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T08:57:43.6428445Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T08:57:43.6429526Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T08:57:43.6431014Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T08:57:43.6432039Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T08:57:43.6433148Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T08:57:43.6434551Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T08:57:43.6435630Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T08:57:43.6436765Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T08:57:43.6438276Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T08:57:43.6439324Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T08:57:43.6440434Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T08:57:43.6442026Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T08:57:43.6443254Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T08:57:43.6444606Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T08:57:43.6446019Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T08:57:43.6447485Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T08:57:43.6448902Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T08:57:43.6450591Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T08:57:43.6453921Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T08:57:43.6454585Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T08:57:43.6455248Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T08:57:43.6456211Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T08:57:43.6457452Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T08:57:43.6459101Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T08:57:43.6460631Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T08:57:43.6461326Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T08:57:43.6462911Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T08:57:43.6464048Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T08:57:43.6465223Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T08:57:43.6466775Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T08:57:43.6467820Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T08:57:43.6468982Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T08:57:43.6471233Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T08:57:43.6472698Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T08:57:43.6474250Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T08:57:43.6476491Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T08:57:43.6477592Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T08:57:43.6478816Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T08:57:43.6482106Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T08:57:43.6483226Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T08:57:43.6484656Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T08:57:43.6486267Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T08:57:43.6487384Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T08:57:43.6488576Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T08:57:43.6490056Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T08:57:43.6491129Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T08:57:43.6492418Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T08:57:43.6494656Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T08:57:43.6495734Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T08:57:43.6497474Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T08:57:43.6498514Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T08:57:43.6499565Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T08:57:43.6500981Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T08:57:43.6502104Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T08:57:43.6503300Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T08:57:43.6504844Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T08:57:43.6506011Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T08:57:43.6507103Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T08:57:43.6508780Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T08:57:43.6509689Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T08:57:43.6510856Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T08:57:43.6512390Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T08:57:43.6513411Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T08:57:43.6514506Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T08:57:43.6516307Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T08:57:43.6517459Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T08:57:43.6518545Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T08:57:43.6520516Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T08:57:43.6521618Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T08:57:43.6523423Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T08:57:43.6524463Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T08:57:43.6525621Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T08:57:43.6527358Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T08:57:43.6528618Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T08:57:43.6529806Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T08:57:43.6531360Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T08:57:43.6532483Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T08:57:43.6534037Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T08:57:43.6535505Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T08:57:43.6536612Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T08:57:43.6537801Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T08:57:43.6539253Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T08:57:43.6540358Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T08:57:43.6541506Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T08:57:43.6543220Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T08:57:43.6544124Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T08:57:43.6545274Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T08:57:43.6547088Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T08:57:43.6548233Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T08:57:43.6549383Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T08:57:43.6550957Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T08:57:43.6552388Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T08:57:43.6553555Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T08:57:43.6555032Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T08:57:43.6556190Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T08:57:43.6557398Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T08:57:43.6559107Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T08:57:43.6560120Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T08:57:43.6561236Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T08:57:43.6562908Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T08:57:43.6563907Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T08:57:43.6565018Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T08:57:43.6566938Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T08:57:43.6568330Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T08:57:43.6569410Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T08:57:43.6571412Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T08:57:43.6572509Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T08:57:43.6574074Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T08:57:43.6575798Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T08:57:43.6576944Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T08:57:43.6578076Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T08:57:43.6580296Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T08:57:43.6581365Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T08:57:43.6582798Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T08:57:43.6583823Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T08:57:43.6585213Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T08:57:43.6586267Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T08:57:43.6587693Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T08:57:43.6588869Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T08:57:43.6590785Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T08:57:43.6591831Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T08:57:43.6593296Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T08:57:43.6594339Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T08:57:43.6595439Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T08:57:43.6596952Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T08:57:43.6597976Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T08:57:43.6599071Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T08:57:43.6600681Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T08:57:43.6601607Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T08:57:43.6602794Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T08:57:43.6604277Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T08:57:43.6605319Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T08:57:43.6607167Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T08:57:43.6608192Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T08:57:43.6609306Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T08:57:43.6610922Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T08:57:43.6612074Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T08:57:43.6613304Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T08:57:43.6615476Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T08:57:43.6616721Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T08:57:43.6617903Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T08:57:43.6619429Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T08:57:43.6620643Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T08:57:43.6621938Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T08:57:43.6623473Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T08:57:43.6624556Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T08:57:43.6625839Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T08:57:43.6627376Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T08:57:43.6628436Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T08:57:43.6629542Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T08:57:43.6631051Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T08:57:43.6632209Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T08:57:43.6633345Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T08:57:43.6635065Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T08:57:43.6636135Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T08:57:43.6637263Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T08:57:43.6638749Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T08:57:43.6639862Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T08:57:43.6641016Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T08:57:43.6643042Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T08:57:43.6644129Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T08:57:43.6645221Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T08:57:43.6646847Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T08:57:43.6647967Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T08:57:43.6649070Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T08:57:43.6650748Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T08:57:43.6651897Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T08:57:43.6653115Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T08:57:43.6654963Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T08:57:43.6656368Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T08:57:43.6657486Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T08:57:43.6658905Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T08:57:43.6660088Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T08:57:43.6661162Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T08:57:43.6662586Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T08:57:43.6663895Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T08:57:43.6664958Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T08:57:43.6666882Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T08:57:43.6668059Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T08:57:43.6669935Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T08:57:43.6670928Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T08:57:43.6672024Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T08:57:43.6673535Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T08:57:43.6674617Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T08:57:43.6675986Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T08:57:43.6677037Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T08:57:43.6678379Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T08:57:43.6679758Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T08:57:43.6681359Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T08:57:43.6682446Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T08:57:43.6683955Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T08:57:43.6685025Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T08:57:43.6686127Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T08:57:43.6687833Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T08:57:43.6688916Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T08:57:43.6690073Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T08:57:43.6691799Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T08:57:43.6692964Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T08:57:43.6694352Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T08:57:43.6695882Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T08:57:43.6696951Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T08:57:43.6698136Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T08:57:43.6699659Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T08:57:43.6700774Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T08:57:43.6701927Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T08:57:43.6703425Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T08:57:43.6704478Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T08:57:43.6705618Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T08:57:43.6707321Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T08:57:43.6708340Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T08:57:43.6709418Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T08:57:43.6710890Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T08:57:43.6711924Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T08:57:43.6713021Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T08:57:43.6714504Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T08:57:43.6715541Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T08:57:43.6716621Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T08:57:43.6718121Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T08:57:43.6719164Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T08:57:43.6720269Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T08:57:43.6721841Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T08:57:43.6722908Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T08:57:43.6724517Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T08:57:43.6726407Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T08:57:43.6727507Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T08:57:43.6728586Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T08:57:43.6730146Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T08:57:43.6731216Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T08:57:43.6732311Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T08:57:43.6734282Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T08:57:43.6735255Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T08:57:43.6736378Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T08:57:43.6738278Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T08:57:43.6739278Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T08:57:43.6740425Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T08:57:43.6742325Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T08:57:43.6743446Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T08:57:43.6745364Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T08:57:43.6746604Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T08:57:43.6748282Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T08:57:43.6749451Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T08:57:43.6750594Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T08:57:43.6752388Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T08:57:43.6753671Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T08:57:43.6754773Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T08:57:43.6756277Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T08:57:43.6757348Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T08:57:43.6758461Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T08:57:43.6760069Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T08:57:43.6761114Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T08:57:43.6762141Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T08:57:43.6763823Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T08:57:43.6764991Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T08:57:43.6766089Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T08:57:43.6767780Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T08:57:43.6768843Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T08:57:43.6769931Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T08:57:43.6771309Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T08:57:43.6772302Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T08:57:43.6774082Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T08:57:43.6775044Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T08:57:43.6777040Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T08:57:43.6778146Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T08:57:43.6779721Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T08:57:43.6781578Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T08:57:43.6782797Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T08:57:43.6784086Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T08:57:43.6785512Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T08:57:43.6786569Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T08:57:43.6787722Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T08:57:43.6789246Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T08:57:43.6790312Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T08:57:43.6791543Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T08:57:43.6793054Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T08:57:43.6794074Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T08:57:43.6795169Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T08:57:43.6796599Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T08:57:43.6797787Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T08:57:43.6798929Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T08:57:43.6800423Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T08:57:43.6801537Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T08:57:43.6802778Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T08:57:43.6804167Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T08:57:43.6805251Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T08:57:43.6806303Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T08:57:43.6808085Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T08:57:43.6809016Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T08:57:43.6810078Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T08:57:43.6811830Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T08:57:43.6813577Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T08:57:43.6814779Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T08:57:43.6816339Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T08:57:43.6817384Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T08:57:43.6818487Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T08:57:43.6820329Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T08:57:43.6821398Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T08:57:43.6822507Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T08:57:43.6824192Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T08:57:43.6825200Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T08:57:43.6826474Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T08:57:43.6828008Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T08:57:43.6829236Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T08:57:43.6830248Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T08:57:43.6831949Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T08:57:43.6832910Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T08:57:43.6834007Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T08:57:43.6835499Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T08:57:43.6836586Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T08:57:43.6837651Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T08:57:43.6839676Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T08:57:43.6840850Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T08:57:43.6841980Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T08:57:43.6843558Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T08:57:43.6844692Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T08:57:43.6845876Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T08:57:43.6847752Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T08:57:43.6848438Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T08:57:43.6849531Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T08:57:43.6851175Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T08:57:43.6852173Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T08:57:43.6853587Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T08:57:43.6855249Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T08:57:43.6856319Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T08:57:43.6857453Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T08:57:43.6858924Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T08:57:43.6860209Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T08:57:43.6861370Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T08:57:43.6863081Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T08:57:43.6864156Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T08:57:43.6865414Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T08:57:43.6866885Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T08:57:43.6867929Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T08:57:43.6869276Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T08:57:43.6870656Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T08:57:43.6871710Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T08:57:43.6872723Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T08:57:43.6874249Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T08:57:43.6875359Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T08:57:43.6876370Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T08:57:43.6878203Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T08:57:43.6879715Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T08:57:43.6881546Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T08:57:43.6882677Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T08:57:43.6883748Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T08:57:43.6885254Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T08:57:43.6886591Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T08:57:43.6887686Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T08:57:43.6889481Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T08:57:43.6890777Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T08:57:43.6892001Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T08:57:43.6893751Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T08:57:43.6894994Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T08:57:43.6896164Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T08:57:43.6897780Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T08:57:43.6898909Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T08:57:43.6900034Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T08:57:43.6901798Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T08:57:43.6902923Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T08:57:43.6904137Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T08:57:43.6906316Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T08:57:43.6907635Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T08:57:43.6908686Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T08:57:43.6910229Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T08:57:43.6911527Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T08:57:43.6912564Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T08:57:43.6914003Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T08:57:43.6915061Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T08:57:43.6916187Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T08:57:43.6917810Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T08:57:43.6918968Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T08:57:43.6920065Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T08:57:43.6921656Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T08:57:43.6922938Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T08:57:43.6924124Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T08:57:43.6925697Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T08:57:43.6926978Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T08:57:43.6928061Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T08:57:43.6929592Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T08:57:43.6930640Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T08:57:43.6932211Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T08:57:43.6933318Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T08:57:43.6934731Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T08:57:43.6936357Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T08:57:43.6937538Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T08:57:43.6938865Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T08:57:43.6940849Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T08:57:43.6942064Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T08:57:43.6943227Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T08:57:43.6945037Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T08:57:43.6946459Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T08:57:43.6947493Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T08:57:43.6949081Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T08:57:43.6950150Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T08:57:43.6951270Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T08:57:43.6952666Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T08:57:43.6953754Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T08:57:43.6955169Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T08:57:43.6956928Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T08:57:43.6958011Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T08:57:43.6959495Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T08:57:43.6960477Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T08:57:43.6962010Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T08:57:43.6963183Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T08:57:43.6964595Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T08:57:43.6965645Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T08:57:43.6967576Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T08:57:43.6968740Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T08:57:43.6969763Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T08:57:43.6971319Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T08:57:43.6972273Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T08:57:43.6973645Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T08:57:43.6975363Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T08:57:43.6976390Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T08:57:43.6977627Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T08:57:43.6979391Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T08:57:43.6980495Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T08:57:43.6981607Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T08:57:43.6983246Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T08:57:43.6984293Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T08:57:43.6985493Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T08:57:43.6987071Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T08:57:43.6988098Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T08:57:43.6989250Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T08:57:43.6990915Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T08:57:43.6991928Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T08:57:43.6993041Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T08:57:43.6994573Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T08:57:43.6995692Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T08:57:43.6996802Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T08:57:43.6998387Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T08:57:43.6999402Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T08:57:43.7000497Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T08:57:43.7002016Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T08:57:43.7003054Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T08:57:43.7004247Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T08:57:43.7005715Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T08:57:43.7006776Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T08:57:43.7007831Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T08:57:43.7009336Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T08:57:43.7010360Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T08:57:43.7011497Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T08:57:43.7013196Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T08:57:43.7014687Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T08:57:43.7015770Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T08:57:43.7017426Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T08:57:43.7018354Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T08:57:43.7019534Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T08:57:43.7021150Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T08:57:43.7022275Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T08:57:43.7023420Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T08:57:43.7025405Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T08:57:43.7026664Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T08:57:43.7027868Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T08:57:43.7029461Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T08:57:43.7030546Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T08:57:43.7031695Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T08:57:43.7033207Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T08:57:43.7034284Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T08:57:43.7035421Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T08:57:43.7036813Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T08:57:43.7037897Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T08:57:43.7039030Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T08:57:43.7040522Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T08:57:43.7041624Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T08:57:43.7042698Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T08:57:43.7044355Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T08:57:43.7045432Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T08:57:43.7046514Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T08:57:43.7048464Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T08:57:43.7049542Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T08:57:43.7050698Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T08:57:43.7052193Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T08:57:43.7053374Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T08:57:43.7054864Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T08:57:43.7056381Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T08:57:43.7057492Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T08:57:43.7058628Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T08:57:43.7060276Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T08:57:43.7061394Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T08:57:43.7062607Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T08:57:43.7064078Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T08:57:43.7065242Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T08:57:43.7066478Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T08:57:43.7068190Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T08:57:43.7069046Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T08:57:43.7070298Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T08:57:43.7071818Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T08:57:43.7072910Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T08:57:43.7074029Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T08:57:43.7075706Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T08:57:43.7076719Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T08:57:43.7077822Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T08:57:43.7082105Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T08:57:43.7083365Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T08:57:43.7084537Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T08:57:43.7086074Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T08:57:43.7087254Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T08:57:43.7088346Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T08:57:43.7090031Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T08:57:43.7091276Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T08:57:43.7092434Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T08:57:43.7094450Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T08:57:43.7095545Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T08:57:43.7096743Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T08:57:43.7098525Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T08:57:43.7099665Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T08:57:43.7100857Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T08:57:43.7102418Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T08:57:43.7103561Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T08:57:43.7104688Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T08:57:43.7106267Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T08:57:43.7107362Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T08:57:43.7108505Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T08:57:43.7110090Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T08:57:43.7111281Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T08:57:43.7112318Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T08:57:43.7113787Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T08:57:43.7114884Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T08:57:43.7115958Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T08:57:43.7117386Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T08:57:43.7118544Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T08:57:43.7119628Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T08:57:43.7121365Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T08:57:43.7122491Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T08:57:43.7123670Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T08:57:43.7125121Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T08:57:43.7126286Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T08:57:43.7127465Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T08:57:43.7128841Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T08:57:43.7129954Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T08:57:43.7131048Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T08:57:43.7132527Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T08:57:43.7133948Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T08:57:43.7135109Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T08:57:43.7136785Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T08:57:43.7137807Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T08:57:43.7139002Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T08:57:43.7140664Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T08:57:43.7141754Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T08:57:43.7142824Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T08:57:43.7144359Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T08:57:43.7145520Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T08:57:43.7146694Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T08:57:43.7148249Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T08:57:43.7149346Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T08:57:43.7150443Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T08:57:43.7151885Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T08:57:43.7153002Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T08:57:43.7154386Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T08:57:43.7155708Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T08:57:43.7157041Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T08:57:43.7157921Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T08:57:43.7159266Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T08:57:43.7160356Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T08:57:43.7161433Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T08:57:43.7162921Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T08:57:43.7164083Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T08:57:43.7165168Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T08:57:43.7166689Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T08:57:43.7167869Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T08:57:43.7169102Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T08:57:43.7170813Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T08:57:43.7171904Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T08:57:43.7173105Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T08:57:43.7174894Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T08:57:43.7176096Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T08:57:43.7177242Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T08:57:43.7178888Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T08:57:43.7180119Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T08:57:43.7181416Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T08:57:43.7182861Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T08:57:43.7184068Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T08:57:43.7185187Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T08:57:43.7186923Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T08:57:43.7188137Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T08:57:43.7189576Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T08:57:43.7190786Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T08:57:43.7191869Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T08:57:43.7193287Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T08:57:43.7194411Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T08:57:43.7195478Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T08:57:43.7196960Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T08:57:43.7198126Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T08:57:43.7199274Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T08:57:43.7200699Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T08:57:43.7201787Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T08:57:43.7203013Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T08:57:43.7204704Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T08:57:43.7205754Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T08:57:43.7206886Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T08:57:43.7208460Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T08:57:43.7209620Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T08:57:43.7210572Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T08:57:43.7212187Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T08:57:43.7213934Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T08:57:43.7214948Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T08:57:43.7216442Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T08:57:43.7217747Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T08:57:43.7218800Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T08:57:43.7220553Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T08:57:43.7221653Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T08:57:43.7222720Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T08:57:43.7224224Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T08:57:43.7225366Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T08:57:43.7226611Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T08:57:43.7228223Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T08:57:43.7229352Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T08:57:43.7230460Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T08:57:43.7231911Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T08:57:43.7233101Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T08:57:43.7234106Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T08:57:43.7235637Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T08:57:43.7236654Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T08:57:43.7237762Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T08:57:43.7239458Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T08:57:43.7240511Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T08:57:43.7241664Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T08:57:43.7243243Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T08:57:43.7244334Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T08:57:43.7245976Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T08:57:43.7247411Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T08:57:43.7248504Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T08:57:43.7249849Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T08:57:43.7251574Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T08:57:43.7252685Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T08:57:43.7254185Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T08:57:43.7255695Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T08:57:43.7256829Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T08:57:43.7258098Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T08:57:43.7259561Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T08:57:43.7260708Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T08:57:43.7262671Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T08:57:43.7263824Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T08:57:43.7264990Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T08:57:43.7266195Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T08:57:43.7267630Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T08:57:43.7268779Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T08:57:43.7269889Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T08:57:43.7271369Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T08:57:43.7272406Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T08:57:43.7273634Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T08:57:43.7275041Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T08:57:43.7276224Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T08:57:43.7277248Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T08:57:43.7278850Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T08:57:43.7280369Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T08:57:43.7281508Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T08:57:43.7283453Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T08:57:43.7284567Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T08:57:43.7285791Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T08:57:43.7287904Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T08:57:43.7289113Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T08:57:43.7290270Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T08:57:43.7291884Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T08:57:43.7293121Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T08:57:43.7294431Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T08:57:43.7295994Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T08:57:43.7297321Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T08:57:43.7298254Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T08:57:43.7299814Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T08:57:43.7300937Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T08:57:43.7302105Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T08:57:43.7303597Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T08:57:43.7304716Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T08:57:43.7306051Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T08:57:43.7307489Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T08:57:43.7308604Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T08:57:43.7309748Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T08:57:43.7311171Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T08:57:43.7312254Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T08:57:43.7313378Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T08:57:43.7315240Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T08:57:43.7316301Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T08:57:43.7317841Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T08:57:43.7319332Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T08:57:43.7320425Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T08:57:43.7321612Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T08:57:43.7323123Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T08:57:43.7324251Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T08:57:43.7325320Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T08:57:43.7326810Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T08:57:43.7327875Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T08:57:43.7328995Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T08:57:43.7330459Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T08:57:43.7331582Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T08:57:43.7332674Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T08:57:43.7334495Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T08:57:43.7335670Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T08:57:43.7336905Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T08:57:43.7338441Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T08:57:43.7339545Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T08:57:43.7340690Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T08:57:43.7342345Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T08:57:43.7343422Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T08:57:43.7344543Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T08:57:43.7346152Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T08:57:43.7347243Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T08:57:43.7348701Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T08:57:43.7349878Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T08:57:43.7350958Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T08:57:43.7352554Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T08:57:43.7353755Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T08:57:43.7354921Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T08:57:43.7356351Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T08:57:43.7357415Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T08:57:43.7358521Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T08:57:43.7359996Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T08:57:43.7361096Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T08:57:43.7362199Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T08:57:43.7363718Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T08:57:43.7364781Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T08:57:43.7365862Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T08:57:43.7367586Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T08:57:43.7368711Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T08:57:43.7369874Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T08:57:43.7371390Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T08:57:43.7372488Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T08:57:43.7373940Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T08:57:43.7375524Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T08:57:43.7376639Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T08:57:43.7377754Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T08:57:43.7382717Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T08:57:43.7384021Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T08:57:43.7385203Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T08:57:43.7386867Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T08:57:43.7387951Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T08:57:43.7389083Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T08:57:43.7390855Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T08:57:43.7391795Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T08:57:43.7393097Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T08:57:43.7394582Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T08:57:43.7395664Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T08:57:43.7396758Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T08:57:43.7398152Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T08:57:43.7399258Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T08:57:43.7400345Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T08:57:43.7401975Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T08:57:43.7403040Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T08:57:43.7404157Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T08:57:43.7405634Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T08:57:43.7406700Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T08:57:43.7407814Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T08:57:43.7409254Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T08:57:43.7410342Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T08:57:43.7411443Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T08:57:43.7412912Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T08:57:43.7414376Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T08:57:43.7415585Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T08:57:43.7417593Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T08:57:43.7418722Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T08:57:43.7419882Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T08:57:43.7421298Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T08:57:43.7422821Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T08:57:43.7423710Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T08:57:43.7425265Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T08:57:43.7426464Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T08:57:43.7429662Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T08:57:43.7431046Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T08:57:43.7432109Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T08:57:43.7433350Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T08:57:43.7434755Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T08:57:43.7436010Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T08:57:43.7437096Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T08:57:43.7439411Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T08:57:43.7439847Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T08:57:43.7440709Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T08:57:43.7442239Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T08:57:43.7443335Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T08:57:43.7444417Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T08:57:43.7446011Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T08:57:43.7446917Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T08:57:43.7448178Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T08:57:43.7449606Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T08:57:43.7450708Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T08:57:43.7451948Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T08:57:43.7453614Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T08:57:43.7454863Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T08:57:43.7455970Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T08:57:43.7457476Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T08:57:43.7458581Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T08:57:43.7459730Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T08:57:43.7461197Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T08:57:43.7462283Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T08:57:43.7463503Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T08:57:43.7464984Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T08:57:43.7466211Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T08:57:43.7467293Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T08:57:43.7468720Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T08:57:43.7469815Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T08:57:43.7470948Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T08:57:43.7472413Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T08:57:43.7473506Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T08:57:43.7474718Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T08:57:43.7476159Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T08:57:43.7477362Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T08:57:43.7478560Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T08:57:43.7480911Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T08:57:43.7482038Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T08:57:43.7483220Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T08:57:43.7484777Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T08:57:43.7485922Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T08:57:43.7487060Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T08:57:43.7488771Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T08:57:43.7489916Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T08:57:43.7491080Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T08:57:43.7492797Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T08:57:43.7494270Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T08:57:43.7495575Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T08:57:43.7496988Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T08:57:43.7498100Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T08:57:43.7499398Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T08:57:43.7500925Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T08:57:43.7502086Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T08:57:43.7503486Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T08:57:43.7504809Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T08:57:43.7508171Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T08:57:43.7509547Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T08:57:43.7509842Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T08:57:43.7510412Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T08:57:43.7511454Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T08:57:43.7512907Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T08:57:43.7514012Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T08:57:43.7515091Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T08:57:43.7516685Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T08:57:43.7517873Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T08:57:43.7518976Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T08:57:43.7520456Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T08:57:43.7521729Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T08:57:43.7522620Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T08:57:43.7524193Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T08:57:43.7525293Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T08:57:43.7526485Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T08:57:43.7527994Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T08:57:43.7529123Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T08:57:43.7530232Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T08:57:43.7531781Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T08:57:43.7532848Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T08:57:43.7534419Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T08:57:43.7535891Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T08:57:43.7537022Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T08:57:43.7538187Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T08:57:43.7540475Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T08:57:43.7541755Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T08:57:43.7543653Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T08:57:43.7544836Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T08:57:43.7546103Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T08:57:43.7547493Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T08:57:43.7548638Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T08:57:43.7549875Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T08:57:43.7551291Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T08:57:43.7552800Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T08:57:43.7553374Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T08:57:43.7555219Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T08:57:43.7556891Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T08:57:43.7558421Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T08:57:43.7559916Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T08:57:43.7561286Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T08:57:43.7562663Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T08:57:43.7564398Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T08:57:43.7565508Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T08:57:43.7567378Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T08:57:43.7568533Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T08:57:43.7569888Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T08:57:43.7570987Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T08:57:43.7572127Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T08:57:43.7573895Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T08:57:43.7575038Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T08:57:43.7576399Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T08:57:43.7577530Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T08:57:43.7579177Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T08:57:43.7580413Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T08:57:43.7581493Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T08:57:43.7583135Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T08:57:43.7584242Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T08:57:43.7585849Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T08:57:43.7587698Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T08:57:43.7588830Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T08:57:43.7589999Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T08:57:43.7591543Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T08:57:43.7592652Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T08:57:43.7593740Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T08:57:43.7595223Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T08:57:43.7596304Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T08:57:43.7597482Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T08:57:43.7598894Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T08:57:43.7600045Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T08:57:43.7601170Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T08:57:43.7603088Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T08:57:43.7604445Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T08:57:43.7605564Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T08:57:43.7607211Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T08:57:43.7608208Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T08:57:43.7609358Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T08:57:43.7610836Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T08:57:43.7611965Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T08:57:43.7613610Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T08:57:43.7614855Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T08:57:43.7636065Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T08:57:43.7636555Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T08:57:43.7636813Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T08:57:43.7637050Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T08:57:43.7637297Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T08:57:43.7637530Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T08:57:43.7637762Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T08:57:43.7638006Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T08:57:43.7638243Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T08:57:43.7638490Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T08:57:43.7638725Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T08:57:43.7638957Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T08:57:43.7639202Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T08:57:43.7639434Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T08:57:43.7639666Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T08:57:43.7639912Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T08:57:43.7640146Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T08:57:43.7640400Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T08:57:43.7640638Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T08:57:43.7640869Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T08:57:43.7641593Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T08:57:43.7642760Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T08:57:43.7644129Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T08:57:43.7645127Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T08:57:43.7646971Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T08:57:43.7648113Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T08:57:43.7649204Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T08:57:43.7650553Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T08:57:43.7651612Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T08:57:43.7652771Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T08:57:43.7654952Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T08:57:43.7656127Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T08:57:43.7657225Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T08:57:43.7658738Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T08:57:43.7659906Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T08:57:43.7661088Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T08:57:43.7662968Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T08:57:43.7664119Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T08:57:43.7665472Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T08:57:43.7666627Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T08:57:43.7668102Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T08:57:43.7669293Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T08:57:43.7671050Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T08:57:43.7672074Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T08:57:43.7673585Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T08:57:43.7674743Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T08:57:43.7675857Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T08:57:43.7677369Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T08:57:43.7678508Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T08:57:43.7680198Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T08:57:43.7681693Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T08:57:43.7682868Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T08:57:43.7683994Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T08:57:43.7685809Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T08:57:43.7686893Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T08:57:43.7688025Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T08:57:43.7689578Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T08:57:43.7691288Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T08:57:43.7692549Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T08:57:43.7694390Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T08:57:43.7695484Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T08:57:43.7696649Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T08:57:43.7698149Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T08:57:43.7699330Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T08:57:43.7700526Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T08:57:43.7702000Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T08:57:43.7703168Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T08:57:43.7704299Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T08:57:43.7706364Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T08:57:43.7707495Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T08:57:43.7708843Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T08:57:43.7710597Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T08:57:43.7711694Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T08:57:43.7712943Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T08:57:43.7714301Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T08:57:43.7715371Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T08:57:43.7716476Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T08:57:43.7717954Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T08:57:43.7719067Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T08:57:43.7720137Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T08:57:43.7721737Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T08:57:43.7722935Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T08:57:43.7724043Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T08:57:43.7725995Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T08:57:43.7727438Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T08:57:43.7728851Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T08:57:43.7729917Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T08:57:43.7731077Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T08:57:43.7732618Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T08:57:43.7733974Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T08:57:43.7735126Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T08:57:43.7736632Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T08:57:43.7737737Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T08:57:43.7738833Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T08:57:43.7740305Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T08:57:43.7741406Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T08:57:43.7742580Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T08:57:43.7744075Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T08:57:43.7745211Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T08:57:43.7746421Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T08:57:43.7748503Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T08:57:43.7749556Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T08:57:43.7751110Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T08:57:43.7752550Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T08:57:43.7753677Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T08:57:43.7754809Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T08:57:43.7756183Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T08:57:43.7757250Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T08:57:43.7758356Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T08:57:43.7759976Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T08:57:43.7761110Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T08:57:43.7762361Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T08:57:43.7763804Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T08:57:43.7764883Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T08:57:43.7765961Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T08:57:43.7767363Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T08:57:43.7768421Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T08:57:43.7769491Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T08:57:43.7770924Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T08:57:43.7772004Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T08:57:43.7773141Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T08:57:43.7774954Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T08:57:43.7776051Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T08:57:43.7777183Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T08:57:43.7778935Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T08:57:43.7780176Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T08:57:43.7781262Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T08:57:43.7782739Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T08:57:43.7783864Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T08:57:43.7785002Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T08:57:43.7786943Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T08:57:43.7788350Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T08:57:43.7789219Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T08:57:43.7790897Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T08:57:43.7791977Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T08:57:43.7793094Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T08:57:43.7794715Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T08:57:43.7795810Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T08:57:43.7796879Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T08:57:43.7798395Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T08:57:43.7799466Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T08:57:43.7800654Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T08:57:43.7802106Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T08:57:43.7803305Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T08:57:43.7804416Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T08:57:43.7805883Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T08:57:43.7806956Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T08:57:43.7808033Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T08:57:43.7810039Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T08:57:43.7811133Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T08:57:43.7812252Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T08:57:43.7814038Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T08:57:43.7815192Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T08:57:43.7816363Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T08:57:43.7817890Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T08:57:43.7818969Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T08:57:43.7820139Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T08:57:43.7821710Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T08:57:43.7822890Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T08:57:43.7824005Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T08:57:43.7825972Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T08:57:43.7827077Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T08:57:43.7828163Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T08:57:43.7829684Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T08:57:43.7830751Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T08:57:43.7831856Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T08:57:43.7833321Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T08:57:43.7834451Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T08:57:43.7835548Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T08:57:43.7837380Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T08:57:43.7838493Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T08:57:43.7839641Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T08:57:43.7841147Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T08:57:43.7842185Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T08:57:43.7843367Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T08:57:43.7845621Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T08:57:43.7846734Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T08:57:43.7847749Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T08:57:43.7849450Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T08:57:43.7850593Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T08:57:43.7851717Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T08:57:43.7853243Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T08:57:43.7854686Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T08:57:43.7855826Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T08:57:43.7857322Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T08:57:43.7858388Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T08:57:43.7859506Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T08:57:43.7861009Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T08:57:43.7862140Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T08:57:43.7863291Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T08:57:43.7864791Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T08:57:43.7866023Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T08:57:43.7867119Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T08:57:43.7868657Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T08:57:43.7869768Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T08:57:43.7870849Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T08:57:43.7872364Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T08:57:43.7873452Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T08:57:43.7874523Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T08:57:43.7875904Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T08:57:43.7877024Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T08:57:43.7878134Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T08:57:43.7879971Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T08:57:43.7881245Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T08:57:43.7882367Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T08:57:43.7884067Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T08:57:43.7885149Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T08:57:43.7886270Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T08:57:43.7887701Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T08:57:43.7888856Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T08:57:43.7890152Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T08:57:43.7891874Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T08:57:43.7892811Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T08:57:43.7894334Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T08:57:43.7895766Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T08:57:43.7896917Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T08:57:43.7898015Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T08:57:43.7899708Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T08:57:43.7900709Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T08:57:43.7901819Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T08:57:43.7903312Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T08:57:43.7904405Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T08:57:43.7905638Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T08:57:43.7907255Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T08:57:43.7908340Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T08:57:43.7910029Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T08:57:43.7911127Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T08:57:43.7912230Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T08:57:43.7913734Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T08:57:43.7914832Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T08:57:43.7916360Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T08:57:43.7918763Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T08:57:43.7919976Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T08:57:43.7921106Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T08:57:43.7922593Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T08:57:43.7923759Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T08:57:43.7924865Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T08:57:43.7926752Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T08:57:43.7927942Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T08:57:43.7929185Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T08:57:43.7930661Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T08:57:43.7931824Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T08:57:43.7932916Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T08:57:43.7934974Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T08:57:43.7936169Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T08:57:43.7937307Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T08:57:43.7939618Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T08:57:43.7940746Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T08:57:43.7941900Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T08:57:43.7943353Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T08:57:43.7944527Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T08:57:43.7945882Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T08:57:43.7947322Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T08:57:43.7948330Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T08:57:43.7949441Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T08:57:43.7951032Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T08:57:43.7952515Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T08:57:43.7953779Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T08:57:43.7955289Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T08:57:43.7956410Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T08:57:43.7957533Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T08:57:43.7959807Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T08:57:43.7961709Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T08:57:43.7963022Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T08:57:43.7964626Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T08:57:43.7965892Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T08:57:43.7967048Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T08:57:43.7968612Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T08:57:43.7969653Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T08:57:43.7970730Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T08:57:43.7972377Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T08:57:43.7973947Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T08:57:43.7975091Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T08:57:43.7976644Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T08:57:43.7977794Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T08:57:43.7979246Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T08:57:43.7981397Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T08:57:43.7982648Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T08:57:43.7983775Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T08:57:43.7985686Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T08:57:43.7986921Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T08:57:43.7988204Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T08:57:43.7991359Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T08:57:43.7993082Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T08:57:43.7994249Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T08:57:43.7996009Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T08:57:43.7997178Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T08:57:43.7998256Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T08:57:43.7999751Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T08:57:43.8000852Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T08:57:43.8001957Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T08:57:43.8003974Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T08:57:43.8004823Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T08:57:43.8005951Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T08:57:43.8007413Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T08:57:43.8008521Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T08:57:43.8009584Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T08:57:43.8011131Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T08:57:43.8012236Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T08:57:43.8013398Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T08:57:43.8015684Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T08:57:43.8016842Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T08:57:43.8018018Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T08:57:43.8019597Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T08:57:43.8020681Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T08:57:43.8021871Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T08:57:43.8023375Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T08:57:43.8024541Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T08:57:43.8026250Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T08:57:43.8027780Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T08:57:43.8028838Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T08:57:43.8029965Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T08:57:43.8031611Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T08:57:43.8032705Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T08:57:43.8033801Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T08:57:43.8035635Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T08:57:43.8037048Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T08:57:43.8038077Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T08:57:43.8039561Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T08:57:43.8040672Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T08:57:43.8042319Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T08:57:43.8043478Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T08:57:43.8044604Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T08:57:43.8046074Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T08:57:43.8047176Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T08:57:43.8048258Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T08:57:43.8049711Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T08:57:43.8050780Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T08:57:43.8052243Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T08:57:43.8053361Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T08:57:43.8054754Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T08:57:43.8056246Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T08:57:43.8057447Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T08:57:43.8058589Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T08:57:43.8060210Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T08:57:43.8061355Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T08:57:43.8062490Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T08:57:43.8064018Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T08:57:43.8065150Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T08:57:43.8066356Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T08:57:43.8067801Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T08:57:43.8068900Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T08:57:43.8070028Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T08:57:43.8071423Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T08:57:43.8072597Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T08:57:43.8073697Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T08:57:43.8075119Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T08:57:43.8076198Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T08:57:43.8077272Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T08:57:43.8078930Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T08:57:43.8084048Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T08:57:43.8085298Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T08:57:43.8086726Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T08:57:43.8087843Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T08:57:43.8089023Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T08:57:43.8090529Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T08:57:43.8092093Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T08:57:43.8092914Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T08:57:43.8094747Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T08:57:43.8095872Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T08:57:43.8097013Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T08:57:43.8099019Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T08:57:43.8100260Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T08:57:43.8101594Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T08:57:43.8103665Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T08:57:43.8104897Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T08:57:43.8106240Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T08:57:43.8107697Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T08:57:43.8108901Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T08:57:43.8110005Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T08:57:43.8111513Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T08:57:43.8112687Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T08:57:43.8113803Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T08:57:43.8115363Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T08:57:43.8116535Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T08:57:43.8117644Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T08:57:43.8119202Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T08:57:43.8120383Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T08:57:43.8121499Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T08:57:43.8123050Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T08:57:43.8124415Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T08:57:43.8125496Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T08:57:43.8127116Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T08:57:43.8128377Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T08:57:43.8129454Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T08:57:43.8130916Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T08:57:43.8132166Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T08:57:43.8133260Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T08:57:43.8135044Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T08:57:43.8136221Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T08:57:43.8137313Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T08:57:43.8138850Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T08:57:43.8140079Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T08:57:43.8141282Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T08:57:43.8142884Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T08:57:43.8144180Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T08:57:43.8145512Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T08:57:43.8147025Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T08:57:43.8148190Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T08:57:43.8149346Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T08:57:43.8150818Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T08:57:43.8151971Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T08:57:43.8153037Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T08:57:43.8154549Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T08:57:43.8155787Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T08:57:43.8156882Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T08:57:43.8158371Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T08:57:43.8159545Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T08:57:43.8160639Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T08:57:43.8162151Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T08:57:43.8163327Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T08:57:43.8164454Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T08:57:43.8165969Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T08:57:43.8167064Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T08:57:43.8168257Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T08:57:43.8169749Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T08:57:43.8170987Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T08:57:43.8172095Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T08:57:43.8173675Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T08:57:43.8174876Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T08:57:43.8176031Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T08:57:43.8177649Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T08:57:43.8178899Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T08:57:43.8180293Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T08:57:43.8182062Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T08:57:43.8183154Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T08:57:43.8184282Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T08:57:43.8185865Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T08:57:43.8186996Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T08:57:43.8188102Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T08:57:43.8189709Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T08:57:43.8191142Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T08:57:43.8192470Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T08:57:43.8193575Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T08:57:43.8194883Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T08:57:43.8195874Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T08:57:43.8197494Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T08:57:43.8198543Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T08:57:43.8199719Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T08:57:43.8201411Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T08:57:43.8202482Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T08:57:43.8203570Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T08:57:43.8205532Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T08:57:43.8206567Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T08:57:43.8207640Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T08:57:43.8209147Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T08:57:43.8210272Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T08:57:43.8211411Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T08:57:43.8212718Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T08:57:43.8214063Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T08:57:43.8215844Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T08:57:43.8216939Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T08:57:43.8218027Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T08:57:43.8219720Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T08:57:43.8220819Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T08:57:43.8222176Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T08:57:43.8224272Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T08:57:43.8225635Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T08:57:43.8226757Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T08:57:43.8228285Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T08:57:43.8229535Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T08:57:43.8230691Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T08:57:43.8232393Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T08:57:43.8233400Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T08:57:43.8234608Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T08:57:43.8236149Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T08:57:43.8237245Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T08:57:43.8238375Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T08:57:43.8240050Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T08:57:43.8241161Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T08:57:43.8242298Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T08:57:43.8243843Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T08:57:43.8244971Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T08:57:43.8246138Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T08:57:43.8247728Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T08:57:43.8248836Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T08:57:43.8249915Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T08:57:43.8251577Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T08:57:43.8252729Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T08:57:43.8254248Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T08:57:43.8258083Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T08:57:43.8259236Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T08:57:43.8260927Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T08:57:43.8262248Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T08:57:43.8263316Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T08:57:43.8264792Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T08:57:43.8266003Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T08:57:43.8267124Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T08:57:43.8268529Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T08:57:43.8270117Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T08:57:43.8271223Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T08:57:43.8273002Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T08:57:43.8274368Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T08:57:43.8275443Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T08:57:43.8276643Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T08:57:43.8278077Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T08:57:43.8279612Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T08:57:43.8280749Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T08:57:43.8282307Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T08:57:43.8283468Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T08:57:43.8284989Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T08:57:43.8286084Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T08:57:43.8287215Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T08:57:43.8288724Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T08:57:43.8289827Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T08:57:43.8290961Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T08:57:43.8292738Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T08:57:43.8294388Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T08:57:43.8295447Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T08:57:43.8296900Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T08:57:43.8297891Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T08:57:43.8299013Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T08:57:43.8300523Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T08:57:43.8301640Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T08:57:43.8302784Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T08:57:43.8304361Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T08:57:43.8305617Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T08:57:43.8306764Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T08:57:43.8308281Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T08:57:43.8309335Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T08:57:43.8310419Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T08:57:43.8311817Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T08:57:43.8312893Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T08:57:43.8313989Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T08:57:43.8315397Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T08:57:43.8316486Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T08:57:43.8317703Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T08:57:43.8319028Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T08:57:43.8320091Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T08:57:43.8321148Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T08:57:43.8322715Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T08:57:43.8323776Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T08:57:43.8325001Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T08:57:43.8326461Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T08:57:43.8327527Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T08:57:43.8329138Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T08:57:43.8330549Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T08:57:43.8331705Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T08:57:43.8332854Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T08:57:43.8335114Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T08:57:43.8336243Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T08:57:43.8337395Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T08:57:43.8339094Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T08:57:43.8340211Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T08:57:43.8341365Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T08:57:43.8342928Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T08:57:43.8344090Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T08:57:43.8345239Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T08:57:43.8346822Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T08:57:43.8347915Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T08:57:43.8349020Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T08:57:43.8350403Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T08:57:43.8351484Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T08:57:43.8352602Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T08:57:43.8354197Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T08:57:43.8355253Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T08:57:43.8356404Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T08:57:43.8357881Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T08:57:43.8358976Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T08:57:43.8360118Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T08:57:43.8361585Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T08:57:43.8362683Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T08:57:43.8363874Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T08:57:43.8365288Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T08:57:43.8366342Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T08:57:43.8367488Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T08:57:43.8369129Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T08:57:43.8370319Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T08:57:43.8371402Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T08:57:43.8373551Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T08:57:43.8374892Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T08:57:43.8376070Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T08:57:43.8377633Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T08:57:43.8378900Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T08:57:43.8382112Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T08:57:43.8383611Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T08:57:43.8384668Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T08:57:43.8385803Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T08:57:43.8387418Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T08:57:43.8388558Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T08:57:43.8389702Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T08:57:43.8391383Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T08:57:43.8392500Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T08:57:43.8393674Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T08:57:43.8395181Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T08:57:43.8396817Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T08:57:43.8398044Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T08:57:43.8399535Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T08:57:43.8400645Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T08:57:43.8401816Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T08:57:43.8403405Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T08:57:43.8404516Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T08:57:43.8405627Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T08:57:43.8407226Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T08:57:43.8408275Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T08:57:43.8409462Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T08:57:43.8410939Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T08:57:43.8412280Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T08:57:43.8413265Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T08:57:43.8415057Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T08:57:43.8416221Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T08:57:43.8417398Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T08:57:43.8418881Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T08:57:43.8420004Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T08:57:43.8421636Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T08:57:43.8423675Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T08:57:43.8424784Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T08:57:43.8426764Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T08:57:43.8427838Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T08:57:43.8429000Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T08:57:43.8431331Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T08:57:43.8433224Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T08:57:43.8434322Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T08:57:43.8435440Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T08:57:43.8437606Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T08:57:43.8438706Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T08:57:43.8440055Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T08:57:43.8441306Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T08:57:43.8442658Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T08:57:43.8443760Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T08:57:43.8445065Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T08:57:43.8446256Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T08:57:43.8447523Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T08:57:43.8448582Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T08:57:43.8449934Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T08:57:43.8450946Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T08:57:43.8452238Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T08:57:43.8453283Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T08:57:43.8455502Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T08:57:43.8456642Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T08:57:43.8458167Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T08:57:43.8459326Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T08:57:43.8460814Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T08:57:43.8461751Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T08:57:43.8463193Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T08:57:43.8464250Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T08:57:43.8465871Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T08:57:43.8466940Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T08:57:43.8468413Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T08:57:43.8469525Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T08:57:43.8470641Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T08:57:43.8472277Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T08:57:43.8473466Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T08:57:43.8474549Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T08:57:43.8476242Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T08:57:43.8477310Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T08:57:43.8478396Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T08:57:43.8480397Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T08:57:43.8481457Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T08:57:43.8482637Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T08:57:43.8484257Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T08:57:43.8485380Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T08:57:43.8486541Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T08:57:43.8488254Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T08:57:43.8489415Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T08:57:43.8490511Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T08:57:43.8492196Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T08:57:43.8493352Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T08:57:43.8494720Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T08:57:43.8496784Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T08:57:43.8497994Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T08:57:43.8499668Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T08:57:43.8501399Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T08:57:43.8502704Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T08:57:43.8503847Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T08:57:43.8505775Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T08:57:43.8507164Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T08:57:43.8508287Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T08:57:43.8509729Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T08:57:43.8510842Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T08:57:43.8512005Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T08:57:43.8513873Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T08:57:43.8515047Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T08:57:43.8516174Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T08:57:43.8517527Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T08:57:43.8518628Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T08:57:43.8519702Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T08:57:43.8521360Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T08:57:43.8522735Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T08:57:43.8523860Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T08:57:43.8525359Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T08:57:43.8526467Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T08:57:43.8527611Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T08:57:43.8529141Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T08:57:43.8530236Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T08:57:43.8531326Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T08:57:43.8532927Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T08:57:43.8534631Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T08:57:43.8535703Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T08:57:43.8537596Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T08:57:43.8538982Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T08:57:43.8540132Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T08:57:43.8542200Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T08:57:43.8543341Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T08:57:43.8544515Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T08:57:43.8546371Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T08:57:43.8547492Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T08:57:43.8548824Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T08:57:43.8550363Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T08:57:43.8551321Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T08:57:43.8552506Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T08:57:43.8553974Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T08:57:43.8555243Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T08:57:43.8556370Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T08:57:43.8557905Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T08:57:43.8558997Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T08:57:43.8560157Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T08:57:43.8561636Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T08:57:43.8562784Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T08:57:43.8563853Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T08:57:43.8565463Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T08:57:43.8566606Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T08:57:43.8567705Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T08:57:43.8569200Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T08:57:43.8570389Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T08:57:43.8571895Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T08:57:43.8573730Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T08:57:43.8574979Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T08:57:43.8576284Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T08:57:43.8577786Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T08:57:43.8579110Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T08:57:43.8580345Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T08:57:43.8581894Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T08:57:43.8583047Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T08:57:43.8584167Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T08:57:43.8585697Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T08:57:43.8587043Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T08:57:43.8588178Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T08:57:43.8589761Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T08:57:43.8591067Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T08:57:43.8592337Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T08:57:43.8593741Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T08:57:43.8594894Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T08:57:43.8596043Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T08:57:43.8597540Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T08:57:43.8598703Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T08:57:43.8599814Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T08:57:43.8601321Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T08:57:43.8602485Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T08:57:43.8603584Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T08:57:43.8604964Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T08:57:43.8606505Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T08:57:43.8607637Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T08:57:43.8608979Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T08:57:43.8610065Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T08:57:43.8611127Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T08:57:43.8612467Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T08:57:43.8613865Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T08:57:43.8615020Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T08:57:43.8616628Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T08:57:43.8617938Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T08:57:43.8619121Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T08:57:43.8620657Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T08:57:43.8621791Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T08:57:43.8622974Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T08:57:43.8624535Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T08:57:43.8625843Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T08:57:43.8626985Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T08:57:43.8628615Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T08:57:43.8629713Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T08:57:43.8630863Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T08:57:43.8632345Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T08:57:43.8633635Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T08:57:43.8634676Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T08:57:43.8636034Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T08:57:43.8637120Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T08:57:43.8638298Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T08:57:43.8640210Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T08:57:43.8641343Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T08:57:43.8642506Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T08:57:43.8644086Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T08:57:43.8645092Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T08:57:43.8646182Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T08:57:43.8647811Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T08:57:43.8649051Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T08:57:43.8650177Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T08:57:43.8652258Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T08:57:43.8654172Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T08:57:43.8655307Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T08:57:43.8657109Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T08:57:43.8658202Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T08:57:43.8659323Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T08:57:43.8660884Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T08:57:43.8661951Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T08:57:43.8663067Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T08:57:43.8664517Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T08:57:43.8665819Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T08:57:43.8666994Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T08:57:43.8668297Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T08:57:43.8669375Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T08:57:43.8670443Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T08:57:43.8671897Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T08:57:43.8672964Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T08:57:43.8674062Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T08:57:43.8675421Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T08:57:43.8676489Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T08:57:43.8677699Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T08:57:43.8679559Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T08:57:43.8680683Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T08:57:43.8681778Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T08:57:43.8683396Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T08:57:43.8684240Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T08:57:43.8685435Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T08:57:43.8686890Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T08:57:43.8688131Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T08:57:43.8689096Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T08:57:43.8690584Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T08:57:43.8691818Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T08:57:43.8692891Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T08:57:43.8694808Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T08:57:43.8695986Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T08:57:43.8697147Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T08:57:43.8698598Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T08:57:43.8699692Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T08:57:43.8700796Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T08:57:43.8702218Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T08:57:43.8703296Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T08:57:43.8704943Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T08:57:43.8706518Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T08:57:43.8707675Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T08:57:43.8708864Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T08:57:43.8710398Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T08:57:43.8711474Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T08:57:43.8712533Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T08:57:43.8713969Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T08:57:43.8715074Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T08:57:43.8716197Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T08:57:43.8717607Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T08:57:43.8718707Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T08:57:43.8719729Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T08:57:43.8721172Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T08:57:43.8722282Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T08:57:43.8723460Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T08:57:43.8725305Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T08:57:43.8726428Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T08:57:43.8727514Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T08:57:43.8729032Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T08:57:43.8730131Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T08:57:43.8731218Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T08:57:43.8732752Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T08:57:43.8734229Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T08:57:43.8735402Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T08:57:43.8736836Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T08:57:43.8737989Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T08:57:43.8739080Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T08:57:43.8740726Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T08:57:43.8741901Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T08:57:43.8743011Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T08:57:43.8744506Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T08:57:43.8745691Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T08:57:43.8746801Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T08:57:43.8748264Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T08:57:43.8749868Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T08:57:43.8750980Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T08:57:43.8752441Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T08:57:43.8753524Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T08:57:43.8755071Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T08:57:43.8756703Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T08:57:43.8757780Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T08:57:43.8758862Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T08:57:43.8760329Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T08:57:43.8761871Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T08:57:43.8762992Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T08:57:43.8764509Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T08:57:43.8765706Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T08:57:43.8766772Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T08:57:43.8768261Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T08:57:43.8769335Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T08:57:43.8770497Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T08:57:43.8772109Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T08:57:43.8773300Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T08:57:43.8774883Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T08:57:43.8776665Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T08:57:43.8778335Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T08:57:43.8780646Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T08:57:43.8781899Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T08:57:43.8783138Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T08:57:43.8784996Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T08:57:43.8786162Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T08:57:43.8787494Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T08:57:43.8788975Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T08:57:43.8790168Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T08:57:43.8791426Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T08:57:43.8792782Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T08:57:43.8793893Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T08:57:43.8795032Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T08:57:43.8798495Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T08:57:43.8798783Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T08:57:43.8799068Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T08:57:43.8800205Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T08:57:43.8801342Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T08:57:43.8802780Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T08:57:43.8804241Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T08:57:43.8805330Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T08:57:43.8806415Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T08:57:43.8807820Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T08:57:43.8808889Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T08:57:43.8809962Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T08:57:43.8811378Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T08:57:43.8812510Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T08:57:43.8814001Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T08:57:43.8815473Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T08:57:43.8816706Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T08:57:43.8818085Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T08:57:43.8819747Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T08:57:43.8820859Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T08:57:43.8822053Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T08:57:43.8823613Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T08:57:43.8824808Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T08:57:43.8826062Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T08:57:43.8827456Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T08:57:43.8828625Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T08:57:43.8830168Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T08:57:43.8831553Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T08:57:43.8832790Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T08:57:43.8833877Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T08:57:43.8835760Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T08:57:43.8836816Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T08:57:43.8837924Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T08:57:43.8839410Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T08:57:43.8840511Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T08:57:43.8841616Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T08:57:43.8843146Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T08:57:43.8844302Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T08:57:43.8845364Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T08:57:43.8846803Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T08:57:43.8847971Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T08:57:43.8849116Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T08:57:43.8850593Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T08:57:43.8851730Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T08:57:43.8853330Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T08:57:43.8855069Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T08:57:43.8856306Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T08:57:43.8857458Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T08:57:43.8858979Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T08:57:43.8860117Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T08:57:43.8861209Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T08:57:43.8862708Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T08:57:43.8863945Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T08:57:43.8864998Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T08:57:43.8866589Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T08:57:43.8867706Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T08:57:43.8868786Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T08:57:43.8870196Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T08:57:43.8871280Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T08:57:43.8872385Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T08:57:43.8874955Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T08:57:43.8876050Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T08:57:43.8877145Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T08:57:43.8878896Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T08:57:43.8884582Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T08:57:43.8885796Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T08:57:43.8887312Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T08:57:43.8888472Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T08:57:43.8889568Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T08:57:43.8891204Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T08:57:43.8892300Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T08:57:43.8893584Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T08:57:43.8895161Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T08:57:43.8896238Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T08:57:43.8897407Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T08:57:43.8899059Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T08:57:43.8900235Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T08:57:43.8901384Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T08:57:43.8903247Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T08:57:43.8904603Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T08:57:43.8905669Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T08:57:43.8907131Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T08:57:43.8908220Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T08:57:43.8909291Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T08:57:43.8910739Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T08:57:43.8911824Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T08:57:43.8912961Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T08:57:43.8914633Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T08:57:43.8915610Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T08:57:43.8916763Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T08:57:43.8918186Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T08:57:43.8919280Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T08:57:43.8920343Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T08:57:43.8921693Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T08:57:43.8922768Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T08:57:43.8924130Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T08:57:43.8925274Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T08:57:43.8926390Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T08:57:43.8927441Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T08:57:43.8928940Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T08:57:43.8930010Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T08:57:43.8931126Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T08:57:43.8932475Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T08:57:43.8933898Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T08:57:43.8935067Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T08:57:43.8936613Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T08:57:43.8937629Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T08:57:43.8938799Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T08:57:43.8940297Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T08:57:43.8941408Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T08:57:43.8942537Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T08:57:43.8944098Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T08:57:43.8945207Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T08:57:43.8946370Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T08:57:43.8948197Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T08:57:43.8949272Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T08:57:43.8950352Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T08:57:43.8951842Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T08:57:43.8952919Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T08:57:43.8954065Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T08:57:43.8955445Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T08:57:43.8956504Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T08:57:43.8957583Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T08:57:43.8959311Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T08:57:43.8961062Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T08:57:43.8962270Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T08:57:43.8963408Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T08:57:43.8964851Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T08:57:43.8965965Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T08:57:43.8967085Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T08:57:43.8968555Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T08:57:43.8969690Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T08:57:43.8970779Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T08:57:43.8972270Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T08:57:43.8973649Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T08:57:43.8974886Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T08:57:43.8976456Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T08:57:43.8977768Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T08:57:43.8979096Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T08:57:43.8980578Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T08:57:43.8981677Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T08:57:43.8982828Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T08:57:43.8984231Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T08:57:43.8985361Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T08:57:43.8986450Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T08:57:43.8987978Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T08:57:43.8989186Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T08:57:43.8990456Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T08:57:43.8991904Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T08:57:43.8992976Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T08:57:43.8994115Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T08:57:43.8995511Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T08:57:43.8996629Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T08:57:43.8997674Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T08:57:43.8999564Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T08:57:43.9000677Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T08:57:43.9002254Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T08:57:43.9003740Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T08:57:43.9004902Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T08:57:43.9006124Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T08:57:43.9007505Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T08:57:43.9008538Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T08:57:43.9009722Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T08:57:43.9011147Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T08:57:43.9012697Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T08:57:43.9014054Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T08:57:43.9015613Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T08:57:43.9016745Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T08:57:43.9017914Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T08:57:43.9019364Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T08:57:43.9020619Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T08:57:43.9021743Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T08:57:43.9023632Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T08:57:43.9024762Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T08:57:43.9026019Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T08:57:43.9027494Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T08:57:43.9028592Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T08:57:43.9029691Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T08:57:43.9031659Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T08:57:43.9032768Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T08:57:43.9033838Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T08:57:43.9035340Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T08:57:43.9036644Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T08:57:43.9037657Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T08:57:43.9039143Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T08:57:43.9040247Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T08:57:43.9041359Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T08:57:43.9042873Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T08:57:43.9044097Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T08:57:43.9045181Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T08:57:43.9046765Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T08:57:43.9047836Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T08:57:43.9048931Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T08:57:43.9050839Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T08:57:43.9052220Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T08:57:43.9053656Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T08:57:43.9055417Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T08:57:43.9056572Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T08:57:43.9057734Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T08:57:43.9059525Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T08:57:43.9060646Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T08:57:43.9062212Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T08:57:43.9063328Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T08:57:43.9064451Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T08:57:43.9066116Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T08:57:43.9067410Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T08:57:43.9068501Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T08:57:43.9070185Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T08:57:43.9071216Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T08:57:43.9072372Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T08:57:43.9073841Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T08:57:43.9074955Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T08:57:43.9076014Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T08:57:43.9077776Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T08:57:43.9079036Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T08:57:43.9080458Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T08:57:43.9081766Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T08:57:43.9083084Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T08:57:43.9084222Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T08:57:43.9086026Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T08:57:43.9087480Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T08:57:43.9088798Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T08:57:43.9090258Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T08:57:43.9091678Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T08:57:43.9092817Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T08:57:43.9094893Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T08:57:43.9096029Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T08:57:43.9097496Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T08:57:43.9098694Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T08:57:43.9099804Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T08:57:43.9101260Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T08:57:43.9102552Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T08:57:43.9103487Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T08:57:43.9105041Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T08:57:43.9106224Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T08:57:43.9107348Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T08:57:43.9108746Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T08:57:43.9109904Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T08:57:43.9110988Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T08:57:43.9112563Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T08:57:43.9113746Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T08:57:43.9114806Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T08:57:43.9116256Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T08:57:43.9117368Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T08:57:43.9118448Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T08:57:43.9119934Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T08:57:43.9120995Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T08:57:43.9122097Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T08:57:43.9123502Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T08:57:43.9124630Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T08:57:43.9125703Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T08:57:43.9127176Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T08:57:43.9128321Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T08:57:43.9129448Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T08:57:43.9130905Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T08:57:43.9131964Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T08:57:43.9133041Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T08:57:43.9134861Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T08:57:43.9135987Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T08:57:43.9137099Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T08:57:43.9138534Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T08:57:43.9139663Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T08:57:43.9140807Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T08:57:43.9142252Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T08:57:43.9143426Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T08:57:43.9144606Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T08:57:43.9146231Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T08:57:43.9147275Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T08:57:43.9148442Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T08:57:43.9149748Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T08:57:43.9150849Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T08:57:43.9151906Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T08:57:43.9153732Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T08:57:43.9154846Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T08:57:43.9155952Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T08:57:43.9157576Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T08:57:43.9158678Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T08:57:43.9159777Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T08:57:43.9161231Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T08:57:43.9162328Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T08:57:43.9163457Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T08:57:43.9164904Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T08:57:43.9166006Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T08:57:43.9167064Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T08:57:43.9168474Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T08:57:43.9170004Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T08:57:43.9171159Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T08:57:43.9173284Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T08:57:43.9174712Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T08:57:43.9175807Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T08:57:43.9177448Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T08:57:43.9178584Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T08:57:43.9180006Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T08:57:43.9181764Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T08:57:43.9182918Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T08:57:43.9184309Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T08:57:43.9185387Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T08:57:43.9186899Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T08:57:43.9188119Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T08:57:43.9189235Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T08:57:43.9190735Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T08:57:43.9191859Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T08:57:43.9193050Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T08:57:43.9194351Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T08:57:43.9195539Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T08:57:43.9196619Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T08:57:43.9198036Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T08:57:43.9199121Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T08:57:43.9200231Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T08:57:43.9201662Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T08:57:43.9202862Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T08:57:43.9203953Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T08:57:43.9205749Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T08:57:43.9206866Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T08:57:43.9207972Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T08:57:43.9209402Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T08:57:43.9210926Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T08:57:43.9212050Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T08:57:43.9213797Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T08:57:43.9214894Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T08:57:43.9216028Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T08:57:43.9217478Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T08:57:43.9218603Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T08:57:43.9220035Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T08:57:43.9221123Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T08:57:43.9222295Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T08:57:43.9223715Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T08:57:43.9224828Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T08:57:43.9226135Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T08:57:43.9227593Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T08:57:43.9228669Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T08:57:43.9229788Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T08:57:43.9231257Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T08:57:43.9232387Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T08:57:43.9233627Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T08:57:43.9234987Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T08:57:43.9236082Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T08:57:43.9237140Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T08:57:43.9238682Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T08:57:43.9239757Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T08:57:43.9240870Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T08:57:43.9242388Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T08:57:43.9243491Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T08:57:43.9245349Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T08:57:43.9246416Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T08:57:43.9247611Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T08:57:43.9249022Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T08:57:43.9250110Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T08:57:43.9251231Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T08:57:43.9252682Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T08:57:43.9254110Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T08:57:43.9255226Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T08:57:43.9256679Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T08:57:43.9257797Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T08:57:43.9258894Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T08:57:43.9260498Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T08:57:43.9261634Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T08:57:43.9262858Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T08:57:43.9264237Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T08:57:43.9265467Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T08:57:43.9266580Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T08:57:43.9268045Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T08:57:43.9269149Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T08:57:43.9270224Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T08:57:43.9272047Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T08:57:43.9273114Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T08:57:43.9274246Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T08:57:43.9275724Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T08:57:43.9276904Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T08:57:43.9277994Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T08:57:43.9279856Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T08:57:43.9280997Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T08:57:43.9282173Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T08:57:43.9283685Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T08:57:43.9284593Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T08:57:43.9285765Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T08:57:43.9287228Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T08:57:43.9288360Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T08:57:43.9289477Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T08:57:43.9290938Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T08:57:43.9292277Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T08:57:43.9293639Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T08:57:43.9295128Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T08:57:43.9296280Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T08:57:43.9297426Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T08:57:43.9298882Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T08:57:43.9300002Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T08:57:43.9301170Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T08:57:43.9302779Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T08:57:43.9304373Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T08:57:43.9305669Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T08:57:43.9307149Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T08:57:43.9308455Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T08:57:43.9309622Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T08:57:43.9311466Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T08:57:43.9312564Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T08:57:43.9313706Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T08:57:43.9315187Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T08:57:43.9316249Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T08:57:43.9317363Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T08:57:43.9319247Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T08:57:43.9320327Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T08:57:43.9321422Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T08:57:43.9323554Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T08:57:43.9324767Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T08:57:43.9325912Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T08:57:43.9327639Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T08:57:43.9329120Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T08:57:43.9330131Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T08:57:43.9331675Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T08:57:43.9332877Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T08:57:43.9334552Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T08:57:43.9336005Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T08:57:43.9337073Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T08:57:43.9338206Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T08:57:43.9339818Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T08:57:43.9340962Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T08:57:43.9342071Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T08:57:43.9344013Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T08:57:43.9345228Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T08:57:43.9346410Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T08:57:43.9348066Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T08:57:43.9349133Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T08:57:43.9350272Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T08:57:43.9351562Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T08:57:43.9352643Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T08:57:43.9353744Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T08:57:43.9355418Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T08:57:43.9356577Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T08:57:43.9358239Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T08:57:43.9359795Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T08:57:43.9360964Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T08:57:43.9362090Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T08:57:43.9363660Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T08:57:43.9364865Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T08:57:43.9365983Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T08:57:43.9367511Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T08:57:43.9368776Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T08:57:43.9369864Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T08:57:43.9371470Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T08:57:43.9372598Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T08:57:43.9374107Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T08:57:43.9375729Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T08:57:43.9376732Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T08:57:43.9378033Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T08:57:43.9381670Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T08:57:43.9383038Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T08:57:43.9384178Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T08:57:43.9386061Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T08:57:43.9387341Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T08:57:43.9388482Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T08:57:43.9390633Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T08:57:43.9391804Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T08:57:43.9392907Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T08:57:43.9394470Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T08:57:43.9395534Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T08:57:43.9396635Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T08:57:43.9398335Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T08:57:43.9399475Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T08:57:43.9400749Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T08:57:43.9401757Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T08:57:43.9403079Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T08:57:43.9404316Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T08:57:43.9405586Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T08:57:43.9406638Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T08:57:43.9408378Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T08:57:43.9409487Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T08:57:43.9410614Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T08:57:43.9412100Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T08:57:43.9413265Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T08:57:43.9414799Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T08:57:43.9416274Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T08:57:43.9417490Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T08:57:43.9418742Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T08:57:43.9420418Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T08:57:43.9421995Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T08:57:43.9423307Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T08:57:43.9424699Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T08:57:43.9425934Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T08:57:43.9427098Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T08:57:43.9428539Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T08:57:43.9429761Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T08:57:43.9430850Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T08:57:43.9432353Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T08:57:43.9433521Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T08:57:43.9434909Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T08:57:43.9435933Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T08:57:43.9437926Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T08:57:43.9438970Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T08:57:43.9440138Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T08:57:43.9441692Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T08:57:43.9442784Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T08:57:43.9443861Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T08:57:43.9445775Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T08:57:43.9446854Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T08:57:43.9448001Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T08:57:43.9449684Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T08:57:43.9450802Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T08:57:43.9452008Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T08:57:43.9453842Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T08:57:43.9455103Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T08:57:43.9456230Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T08:57:43.9457758Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T08:57:43.9458986Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T08:57:43.9460085Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T08:57:43.9461748Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T08:57:43.9462894Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T08:57:43.9464054Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T08:57:43.9465749Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T08:57:43.9466847Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T08:57:43.9467947Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T08:57:43.9469456Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T08:57:43.9470484Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T08:57:43.9471561Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T08:57:43.9473152Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T08:57:43.9474225Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T08:57:43.9475700Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T08:57:43.9477264Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T08:57:43.9478298Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T08:57:43.9479983Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T08:57:43.9482024Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T08:57:43.9483089Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T08:57:43.9484200Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T08:57:43.9485997Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T08:57:43.9487160Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T08:57:43.9488358Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T08:57:43.9489829Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T08:57:43.9491066Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T08:57:43.9492147Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T08:57:43.9494005Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T08:57:43.9495161Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T08:57:43.9496325Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T08:57:43.9497917Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T08:57:43.9499205Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T08:57:43.9500302Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T08:57:43.9502544Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T08:57:43.9503875Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T08:57:43.9505522Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T08:57:43.9507616Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T08:57:43.9508736Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T08:57:43.9509928Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T08:57:43.9511425Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T08:57:43.9512532Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T08:57:43.9513668Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T08:57:43.9515369Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T08:57:43.9516436Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T08:57:43.9517683Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T08:57:43.9519069Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T08:57:43.9520135Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T08:57:43.9521225Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T08:57:43.9522730Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T08:57:43.9523826Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T08:57:43.9524944Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T08:57:43.9526495Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T08:57:43.9527617Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T08:57:43.9528740Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T08:57:43.9530804Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T08:57:43.9531901Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T08:57:43.9533043Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T08:57:43.9534889Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T08:57:43.9536042Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T08:57:43.9537202Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T08:57:43.9538748Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T08:57:43.9539859Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T08:57:43.9541058Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T08:57:43.9542593Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T08:57:43.9543733Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T08:57:43.9544861Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T08:57:43.9546567Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T08:57:43.9547654Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T08:57:43.9548763Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T08:57:43.9550268Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T08:57:43.9551378Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T08:57:43.9552493Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T08:57:43.9554277Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T08:57:43.9555975Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T08:57:43.9557040Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T08:57:43.9558125Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T08:57:43.9559728Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T08:57:43.9560822Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T08:57:43.9561890Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T08:57:43.9563433Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T08:57:43.9564446Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T08:57:43.9565613Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T08:57:43.9567083Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T08:57:43.9568166Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T08:57:43.9569693Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T08:57:43.9571166Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T08:57:43.9572249Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T08:57:43.9573613Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T08:57:43.9575313Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T08:57:43.9576438Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T08:57:43.9577640Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T08:57:43.9579425Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T08:57:43.9580604Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T08:57:43.9581997Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T08:57:43.9583623Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T08:57:43.9584874Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T08:57:43.9586056Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T08:57:43.9588022Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T08:57:43.9589128Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T08:57:43.9590253Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T08:57:43.9592061Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T08:57:43.9593197Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T08:57:43.9594245Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T08:57:43.9595879Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T08:57:43.9597121Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T08:57:43.9598247Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T08:57:43.9599825Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T08:57:43.9600870Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T08:57:43.9601978Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T08:57:43.9603657Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T08:57:43.9604728Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T08:57:43.9605905Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T08:57:43.9607538Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T08:57:43.9608667Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T08:57:43.9609941Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T08:57:43.9611298Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T08:57:43.9612620Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T08:57:43.9614037Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T08:57:43.9616130Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T08:57:43.9617300Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T08:57:43.9618451Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T08:57:43.9619953Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T08:57:43.9621119Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T08:57:43.9622263Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T08:57:43.9623853Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T08:57:43.9625012Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T08:57:43.9626243Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T08:57:43.9627755Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T08:57:43.9628963Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T08:57:43.9630091Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T08:57:43.9631656Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T08:57:43.9632713Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T08:57:43.9633795Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T08:57:43.9635353Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T08:57:43.9636520Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T08:57:43.9637751Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T08:57:43.9639781Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T08:57:43.9640935Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T08:57:43.9642068Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T08:57:43.9643830Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T08:57:43.9644983Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T08:57:43.9646045Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T08:57:43.9647490Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T08:57:43.9648630Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T08:57:43.9650203Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T08:57:43.9651246Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T08:57:43.9652356Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T08:57:43.9655089Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T08:57:43.9656277Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T08:57:43.9657583Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T08:57:43.9658979Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T08:57:43.9660117Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T08:57:43.9661233Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T08:57:43.9662924Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T08:57:43.9664007Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T08:57:43.9665145Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T08:57:43.9666830Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T08:57:43.9667898Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T08:57:43.9669089Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T08:57:43.9671355Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T08:57:43.9672466Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T08:57:43.9673586Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T08:57:43.9675117Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T08:57:43.9676378Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T08:57:43.9677245Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T08:57:43.9679194Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T08:57:43.9683365Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T08:57:43.9684461Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T08:57:43.9686008Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T08:57:43.9687155Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T08:57:43.9688401Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T08:57:43.9689833Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T08:57:43.9691055Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T08:57:43.9710156Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T08:57:43.9710580Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T08:57:43.9710874Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T08:57:43.9711167Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T08:57:43.9711441Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T08:57:43.9711728Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T08:57:43.9712002Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T08:57:43.9712275Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T08:57:43.9712564Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T08:57:43.9713019Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T08:57:43.9713370Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T08:57:43.9713660Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T08:57:43.9713935Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T08:57:43.9714223Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T08:57:43.9714497Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T08:57:43.9714777Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T08:57:43.9715064Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T08:57:43.9715339Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T08:57:43.9715637Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T08:57:43.9717199Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T08:57:43.9718372Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T08:57:43.9719751Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T08:57:43.9721199Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T08:57:43.9722275Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T08:57:43.9723357Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T08:57:43.9724718Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T08:57:43.9725841Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T08:57:43.9726948Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T08:57:43.9728823Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T08:57:43.9729886Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T08:57:43.9731005Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T08:57:43.9732566Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T08:57:43.9734021Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T08:57:43.9735264Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T08:57:43.9736808Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T08:57:43.9737920Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T08:57:43.9739081Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T08:57:43.9740805Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T08:57:43.9741954Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T08:57:43.9743082Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T08:57:43.9745114Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T08:57:43.9746372Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T08:57:43.9747900Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T08:57:43.9749834Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T08:57:43.9751049Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T08:57:43.9752350Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T08:57:43.9753978Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T08:57:43.9755271Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T08:57:43.9756862Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T08:57:43.9758608Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T08:57:43.9759688Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T08:57:43.9760811Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T08:57:43.9762305Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T08:57:43.9763609Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T08:57:43.9764705Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T08:57:43.9766421Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T08:57:43.9767534Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T08:57:43.9768780Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T08:57:43.9770276Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T08:57:43.9771406Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T08:57:43.9772451Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T08:57:43.9774203Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T08:57:43.9775366Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T08:57:43.9776479Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T08:57:43.9778226Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T08:57:43.9779603Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T08:57:43.9780706Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T08:57:43.9782358Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T08:57:43.9783666Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T08:57:43.9784892Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T08:57:43.9786449Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T08:57:43.9787590Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T08:57:43.9788962Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T08:57:43.9790520Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T08:57:43.9791690Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T08:57:43.9792835Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T08:57:43.9794390Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T08:57:43.9795255Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T08:57:43.9796377Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T08:57:43.9798001Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T08:57:43.9799037Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T08:57:43.9800269Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T08:57:43.9802435Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T08:57:43.9803773Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T08:57:43.9804967Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T08:57:43.9806304Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T08:57:43.9807467Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T08:57:43.9808548Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T08:57:43.9809963Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T08:57:43.9811009Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T08:57:43.9812093Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T08:57:43.9813685Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T08:57:43.9814876Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T08:57:43.9816133Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T08:57:43.9817692Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T08:57:43.9818900Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T08:57:43.9820069Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T08:57:43.9822022Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T08:57:43.9823288Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T08:57:43.9824432Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T08:57:43.9826176Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T08:57:43.9827267Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T08:57:43.9828416Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T08:57:43.9830413Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T08:57:43.9831610Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T08:57:43.9832845Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T08:57:43.9834328Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T08:57:43.9835326Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T08:57:43.9836438Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T08:57:43.9838122Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T08:57:43.9839266Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T08:57:43.9840214Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T08:57:43.9841949Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T08:57:43.9842981Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T08:57:43.9844032Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T08:57:43.9845551Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T08:57:43.9846729Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T08:57:43.9847921Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T08:57:43.9849513Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T08:57:43.9850698Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T08:57:43.9851796Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T08:57:43.9853936Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T08:57:43.9855002Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T08:57:43.9856248Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T08:57:43.9857595Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T08:57:43.9858928Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T08:57:43.9860065Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T08:57:43.9861597Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T08:57:43.9862859Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T08:57:43.9863962Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T08:57:43.9865527Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T08:57:43.9866686Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T08:57:43.9868433Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T08:57:43.9869989Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T08:57:43.9871157Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T08:57:43.9872295Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T08:57:43.9873775Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T08:57:43.9874867Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T08:57:43.9875966Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T08:57:43.9878345Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T08:57:43.9880003Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T08:57:43.9881291Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T08:57:43.9882438Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T08:57:43.9883556Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T08:57:43.9885088Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T08:57:43.9886971Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T08:57:43.9888027Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T08:57:43.9889419Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T08:57:43.9890492Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T08:57:43.9891743Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T08:57:43.9893947Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T08:57:43.9895459Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T08:57:43.9896928Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T08:57:43.9898738Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T08:57:43.9899968Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T08:57:43.9901088Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T08:57:43.9902598Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T08:57:43.9903793Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T08:57:43.9904947Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T08:57:43.9906540Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T08:57:43.9907759Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T08:57:43.9908895Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T08:57:43.9910361Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T08:57:43.9911470Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T08:57:43.9912539Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T08:57:43.9914050Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T08:57:43.9915165Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T08:57:43.9916256Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T08:57:43.9917579Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T08:57:43.9918695Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T08:57:43.9919842Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T08:57:43.9921135Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T08:57:43.9922445Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T08:57:43.9923532Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T08:57:43.9925005Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T08:57:43.9926200Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T08:57:43.9927270Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T08:57:43.9928836Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T08:57:43.9929864Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T08:57:43.9931000Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T08:57:43.9932427Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T08:57:43.9933877Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T08:57:43.9935137Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T08:57:43.9936461Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T08:57:43.9937696Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T08:57:43.9938816Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T08:57:43.9940783Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T08:57:43.9941902Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T08:57:43.9943037Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T08:57:43.9945389Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T08:57:43.9946953Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T08:57:43.9948132Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T08:57:43.9949594Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T08:57:43.9950671Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T08:57:43.9951827Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T08:57:43.9953289Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T08:57:43.9954530Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T08:57:43.9955627Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T08:57:43.9957003Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T08:57:43.9958134Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T08:57:43.9959147Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T08:57:43.9961189Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T08:57:43.9962531Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T08:57:43.9963775Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T08:57:43.9965382Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T08:57:43.9966530Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T08:57:43.9967620Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T08:57:43.9968967Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T08:57:43.9970234Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T08:57:43.9971395Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T08:57:43.9972946Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T08:57:43.9974576Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T08:57:43.9975736Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T08:57:43.9977790Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T08:57:43.9979085Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T08:57:43.9980571Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T08:57:43.9981921Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T08:57:43.9983131Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T08:57:43.9984186Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T08:57:43.9985802Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T08:57:43.9986965Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T08:57:43.9988464Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T08:57:43.9990533Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T08:57:43.9991696Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T08:57:43.9993267Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T08:57:43.9994765Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T08:57:43.9995869Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T08:57:43.9996998Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T08:57:43.9998837Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T08:57:43.9999907Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T08:57:44.0001059Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T08:57:44.0002838Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T08:57:44.0004335Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T08:57:44.0006029Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T08:57:44.0007689Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T08:57:44.0008845Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T08:57:44.0010098Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T08:57:44.0011966Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T08:57:44.0013138Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T08:57:44.0014554Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T08:57:44.0016079Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T08:57:44.0017385Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T08:57:44.0018449Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T08:57:44.0020104Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T08:57:44.0021209Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T08:57:44.0022397Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T08:57:44.0023935Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T08:57:44.0025084Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T08:57:44.0026814Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T08:57:44.0028318Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T08:57:44.0029490Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T08:57:44.0030448Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T08:57:44.0032162Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T08:57:44.0033402Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T08:57:44.0034506Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T08:57:44.0036007Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T08:57:44.0037135Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T08:57:44.0038263Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T08:57:44.0039732Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T08:57:44.0040900Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T08:57:44.0042119Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T08:57:44.0043673Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T08:57:44.0044819Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T08:57:44.0045929Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T08:57:44.0047379Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T08:57:44.0048450Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T08:57:44.0050041Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T08:57:44.0052958Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T08:57:44.0054401Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T08:57:44.0055550Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T08:57:44.0056940Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T08:57:44.0058088Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T08:57:44.0059208Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T08:57:44.0060768Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T08:57:44.0062083Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T08:57:44.0063409Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T08:57:44.0064925Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T08:57:44.0066235Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T08:57:44.0067414Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T08:57:44.0068994Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T08:57:44.0070096Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T08:57:44.0071146Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T08:57:44.0072698Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T08:57:44.0073898Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T08:57:44.0075084Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T08:57:44.0076951Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T08:57:44.0078218Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T08:57:44.0082578Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T08:57:44.0084298Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T08:57:44.0085550Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T08:57:44.0086703Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T08:57:44.0088284Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T08:57:44.0089507Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T08:57:44.0090596Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T08:57:44.0092074Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T08:57:44.0093249Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T08:57:44.0094660Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T08:57:44.0096446Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T08:57:44.0097547Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T08:57:44.0098716Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T08:57:44.0100277Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T08:57:44.0101401Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T08:57:44.0102533Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T08:57:44.0104308Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T08:57:44.0110165Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T08:57:44.0110434Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T08:57:44.0112036Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T08:57:44.0113097Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T08:57:44.0114146Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T08:57:44.0115689Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T08:57:44.0116805Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T08:57:44.0118377Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T08:57:44.0119947Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T08:57:44.0121063Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T08:57:44.0122187Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T08:57:44.0123841Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T08:57:44.0124945Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T08:57:44.0126040Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T08:57:44.0127806Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T08:57:44.0128653Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T08:57:44.0129746Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T08:57:44.0131318Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T08:57:44.0132460Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T08:57:44.0133893Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T08:57:44.0135439Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T08:57:44.0136586Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T08:57:44.0137923Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T08:57:44.0139664Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T08:57:44.0140807Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T08:57:44.0141933Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T08:57:44.0143541Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T08:57:44.0144675Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T08:57:44.0145914Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T08:57:44.0147456Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T08:57:44.0148593Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T08:57:44.0149680Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T08:57:44.0151163Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T08:57:44.0152343Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T08:57:44.0153426Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T08:57:44.0155176Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T08:57:44.0156205Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T08:57:44.0157329Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T08:57:44.0158768Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T08:57:44.0159808Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T08:57:44.0160881Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T08:57:44.0162228Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T08:57:44.0163408Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T08:57:44.0164492Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T08:57:44.0166035Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T08:57:44.0167107Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T08:57:44.0168694Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T08:57:44.0170389Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T08:57:44.0171510Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T08:57:44.0172640Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T08:57:44.0174630Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T08:57:44.0175690Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T08:57:44.0176821Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T08:57:44.0178517Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T08:57:44.0179996Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T08:57:44.0181123Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T08:57:44.0182645Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T08:57:44.0183924Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T08:57:44.0185086Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T08:57:44.0186660Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T08:57:44.0187773Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T08:57:44.0188898Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T08:57:44.0190558Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T08:57:44.0192105Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T08:57:44.0193222Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T08:57:44.0194727Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T08:57:44.0195865Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T08:57:44.0196973Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T08:57:44.0198439Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T08:57:44.0199522Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T08:57:44.0201148Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T08:57:44.0203538Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T08:57:44.0204649Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T08:57:44.0206120Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T08:57:44.0207120Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T08:57:44.0208573Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T08:57:44.0209629Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T08:57:44.0210749Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T08:57:44.0212190Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T08:57:44.0213310Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T08:57:44.0214655Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T08:57:44.0216641Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T08:57:44.0217957Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T08:57:44.0218804Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T08:57:44.0220379Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T08:57:44.0221443Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T08:57:44.0222547Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T08:57:44.0224042Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T08:57:44.0225127Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T08:57:44.0226322Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T08:57:44.0227774Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T08:57:44.0228897Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T08:57:44.0229945Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T08:57:44.0231486Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T08:57:44.0232545Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T08:57:44.0233634Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T08:57:44.0235060Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T08:57:44.0236151Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T08:57:44.0237215Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T08:57:44.0238619Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T08:57:44.0239696Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T08:57:44.0240807Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T08:57:44.0242589Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T08:57:44.0243684Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T08:57:44.0244759Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T08:57:44.0246495Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T08:57:44.0247534Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T08:57:44.0248639Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T08:57:44.0250073Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T08:57:44.0251232Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T08:57:44.0252280Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T08:57:44.0254477Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T08:57:44.0255637Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T08:57:44.0256840Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T08:57:44.0258633Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T08:57:44.0259746Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T08:57:44.0260818Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T08:57:44.0262885Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T08:57:44.0263908Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T08:57:44.0265037Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T08:57:44.0266609Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T08:57:44.0267690Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T08:57:44.0268787Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T08:57:44.0270251Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T08:57:44.0271564Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T08:57:44.0272639Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T08:57:44.0274084Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T08:57:44.0275207Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T08:57:44.0276417Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T08:57:44.0277849Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T08:57:44.0279268Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T08:57:44.0280532Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T08:57:44.0281909Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T08:57:44.0283076Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T08:57:44.0284222Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T08:57:44.0285691Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T08:57:44.0286806Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T08:57:44.0287893Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T08:57:44.0289467Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T08:57:44.0290577Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T08:57:44.0291914Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T08:57:44.0293615Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T08:57:44.0294836Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T08:57:44.0296364Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T08:57:44.0297479Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T08:57:44.0298594Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T08:57:44.0300154Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T08:57:44.0301281Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T08:57:44.0302417Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T08:57:44.0303866Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T08:57:44.0304991Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T08:57:44.0306195Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T08:57:44.0307860Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T08:57:44.0308903Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T08:57:44.0309967Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T08:57:44.0311403Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T08:57:44.0312497Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T08:57:44.0313577Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T08:57:44.0315576Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T08:57:44.0316840Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T08:57:44.0318122Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T08:57:44.0319559Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T08:57:44.0320933Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T08:57:44.0322355Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T08:57:44.0323893Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T08:57:44.0324968Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T08:57:44.0326169Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T08:57:44.0327907Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T08:57:44.0329058Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T08:57:44.0330085Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T08:57:44.0331541Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T08:57:44.0332736Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T08:57:44.0334159Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T08:57:44.0335614Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T08:57:44.0336833Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T08:57:44.0338137Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T08:57:44.0339664Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T08:57:44.0340807Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T08:57:44.0341933Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T08:57:44.0343390Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T08:57:44.0344510Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T08:57:44.0345618Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T08:57:44.0347133Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T08:57:44.0348243Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T08:57:44.0349307Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T08:57:44.0350723Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T08:57:44.0351970Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T08:57:44.0353167Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T08:57:44.0354967Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T08:57:44.0355973Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T08:57:44.0357054Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T08:57:44.0358510Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T08:57:44.0359591Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T08:57:44.0360643Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T08:57:44.0362293Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T08:57:44.0363426Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T08:57:44.0364502Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T08:57:44.0365951Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T08:57:44.0366967Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T08:57:44.0368098Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T08:57:44.0369613Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T08:57:44.0370778Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T08:57:44.0371943Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T08:57:44.0373359Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T08:57:44.0374772Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T08:57:44.0375881Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T08:57:44.0377318Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T08:57:44.0378415Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T08:57:44.0379838Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T08:57:44.0381435Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T08:57:44.0382605Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T08:57:44.0383726Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T08:57:44.0385380Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T08:57:44.0386440Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T08:57:44.0387519Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T08:57:44.0388948Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T08:57:44.0390055Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T08:57:44.0391252Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T08:57:44.0392811Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T08:57:44.0393812Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T08:57:44.0394895Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T08:57:44.0396368Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T08:57:44.0397446Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T08:57:44.0398650Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T08:57:44.0399818Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T08:57:44.0401038Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T08:57:44.0402099Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T08:57:44.0403378Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T08:57:44.0404458Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T08:57:44.0405517Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T08:57:44.0406881Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T08:57:44.0407940Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T08:57:44.0409023Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T08:57:44.0410401Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T08:57:44.0411499Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T08:57:44.0412582Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T08:57:44.0414790Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T08:57:44.0415917Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T08:57:44.0417011Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T08:57:44.0418970Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T08:57:44.0420163Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T08:57:44.0421276Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T08:57:44.0423152Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T08:57:44.0424330Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T08:57:44.0426096Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T08:57:44.0427165Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T08:57:44.0429492Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T08:57:44.0430938Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T08:57:44.0432098Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T08:57:44.0433547Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T08:57:44.0434662Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T08:57:44.0435757Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T08:57:44.0437537Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T08:57:44.0438620Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T08:57:44.0440052Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T08:57:44.0441047Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T08:57:44.0442877Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T08:57:44.0443977Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T08:57:44.0445516Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T08:57:44.0446517Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T08:57:44.0447612Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T08:57:44.0449023Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T08:57:44.0450093Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T08:57:44.0451213Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T08:57:44.0452506Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T08:57:44.0453938Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T08:57:44.0455539Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T08:57:44.0456635Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T08:57:44.0457762Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T08:57:44.0459579Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T08:57:44.0460917Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T08:57:44.0462117Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T08:57:44.0463463Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T08:57:44.0464597Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T08:57:44.0465944Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T08:57:44.0467610Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T08:57:44.0468890Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T08:57:44.0470038Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T08:57:44.0472393Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T08:57:44.0473511Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T08:57:44.0474611Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T08:57:44.0476061Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T08:57:44.0477158Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T08:57:44.0478236Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T08:57:44.0480131Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T08:57:44.0481291Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T08:57:44.0482893Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T08:57:44.0484389Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T08:57:44.0485617Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T08:57:44.0486753Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T08:57:44.0488180Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T08:57:44.0489700Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T08:57:44.0490809Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T08:57:44.0492867Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T08:57:44.0494360Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T08:57:44.0495406Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T08:57:44.0497252Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T08:57:44.0498446Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T08:57:44.0499593Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T08:57:44.0501270Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T08:57:44.0502446Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T08:57:44.0503575Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T08:57:44.0505150Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T08:57:44.0506283Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T08:57:44.0507363Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T08:57:44.0509118Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T08:57:44.0510231Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T08:57:44.0511351Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T08:57:44.0513115Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T08:57:44.0514192Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T08:57:44.0515821Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T08:57:44.0517000Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T08:57:44.0518112Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T08:57:44.0520421Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T08:57:44.0521564Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T08:57:44.0522678Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T08:57:44.0524286Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T08:57:44.0525311Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T08:57:44.0526323Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T08:57:44.0527817Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T08:57:44.0528950Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T08:57:44.0530047Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T08:57:44.0531801Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T08:57:44.0532863Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T08:57:44.0534386Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T08:57:44.0535849Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T08:57:44.0537059Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T08:57:44.0539303Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T08:57:44.0540563Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T08:57:44.0541768Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T08:57:44.0543567Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T08:57:44.0544612Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T08:57:44.0545867Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T08:57:44.0547442Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T08:57:44.0548589Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T08:57:44.0549709Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T08:57:44.0550993Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T08:57:44.0552087Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T08:57:44.0553204Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T08:57:44.0554862Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T08:57:44.0555935Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T08:57:44.0557019Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T08:57:44.0558548Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T08:57:44.0559696Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T08:57:44.0560801Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T08:57:44.0562378Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T08:57:44.0563595Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T08:57:44.0564676Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T08:57:44.0566692Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T08:57:44.0568291Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T08:57:44.0569480Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T08:57:44.0570833Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T08:57:44.0571853Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T08:57:44.0573201Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T08:57:44.0574573Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T08:57:44.0575887Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T08:57:44.0577007Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T08:57:44.0578426Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T08:57:44.0581462Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T08:57:44.0583216Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T08:57:44.0584280Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T08:57:44.0585679Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T08:57:44.0586788Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T08:57:44.0588075Z * [new branch] google-main -> origin/google-main 2025-12-04T08:57:44.0589637Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T08:57:44.0591257Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T08:57:44.0593123Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T08:57:44.0594430Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T08:57:44.0595783Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T08:57:44.0596658Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T08:57:44.0597695Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T08:57:44.0598834Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T08:57:44.0600272Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T08:57:44.0602519Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T08:57:44.0603234Z * [new branch] inlining -> origin/inlining 2025-12-04T08:57:44.0604467Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T08:57:44.0605634Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T08:57:44.0607124Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T08:57:44.0607945Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T08:57:44.0609308Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T08:57:44.0610591Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T08:57:44.0611909Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T08:57:44.0613157Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T08:57:44.0615225Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T08:57:44.0616330Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T08:57:44.0617830Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T08:57:44.0618971Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T08:57:44.0620156Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T08:57:44.0621445Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T08:57:44.0622658Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T08:57:44.0623883Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T08:57:44.0625168Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T08:57:44.0626389Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T08:57:44.0627590Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T08:57:44.0628699Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T08:57:44.0629877Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T08:57:44.0631211Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T08:57:44.0632727Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T08:57:44.0634138Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T08:57:44.0635716Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T08:57:44.0636803Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T08:57:44.0638358Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T08:57:44.0640200Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T08:57:44.0641665Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T08:57:44.0642766Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T08:57:44.0643792Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T08:57:44.0644867Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T08:57:44.0646805Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T08:57:44.0648331Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T08:57:44.0649404Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T08:57:44.0650448Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T08:57:44.0651535Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T08:57:44.0652604Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T08:57:44.0654106Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T08:57:44.0655470Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T08:57:44.0657000Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T08:57:44.0657892Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T08:57:44.0659107Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T08:57:44.0660226Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T08:57:44.0661438Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T08:57:44.0662639Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T08:57:44.0663727Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T08:57:44.0664892Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T08:57:44.0666159Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T08:57:44.0667140Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T08:57:44.0668618Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T08:57:44.0669837Z * [new branch] main -> origin/main 2025-12-04T08:57:44.0671148Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T08:57:44.0672377Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T08:57:44.0673767Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T08:57:44.0674976Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T08:57:44.0676262Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T08:57:44.0677450Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T08:57:44.0678785Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T08:57:44.0680273Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T08:57:44.0681902Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T08:57:44.0683343Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T08:57:44.0684623Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T08:57:44.0686290Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T08:57:44.0687497Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T08:57:44.0689149Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T08:57:44.0690031Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T08:57:44.0691647Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T08:57:44.0692897Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T08:57:44.0694378Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T08:57:44.0695587Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T08:57:44.0696780Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T08:57:44.0697944Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T08:57:44.0699611Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T08:57:44.0700686Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T08:57:44.0701799Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T08:57:44.0702856Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T08:57:44.0703946Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T08:57:44.0705036Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T08:57:44.0706171Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T08:57:44.0707170Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T08:57:44.0708084Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T08:57:44.0709412Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T08:57:44.0710693Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T08:57:44.0711687Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T08:57:44.0712787Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T08:57:44.0713868Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T08:57:44.0715140Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T08:57:44.0716366Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T08:57:44.0717484Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T08:57:44.0718560Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T08:57:44.0719870Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T08:57:44.0720742Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T08:57:44.0721881Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T08:57:44.0722979Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T08:57:44.0724069Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T08:57:44.0725150Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T08:57:44.0726300Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T08:57:44.0727360Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T08:57:44.0728490Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T08:57:44.0729557Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T08:57:44.0730607Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T08:57:44.0731695Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T08:57:44.0732837Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T08:57:44.0734352Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T08:57:44.0735677Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T08:57:44.0736771Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T08:57:44.0737961Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T08:57:44.0739135Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T08:57:44.0740316Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T08:57:44.0741527Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T08:57:44.0742634Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T08:57:44.0743732Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T08:57:44.0744813Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T08:57:44.0745993Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T08:57:44.0747154Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T08:57:44.0748193Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T08:57:44.0749259Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T08:57:44.0750340Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T08:57:44.0751349Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T08:57:44.0752542Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T08:57:44.0753664Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T08:57:44.0754795Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T08:57:44.0755836Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T08:57:44.0756923Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T08:57:44.0757957Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T08:57:44.0759062Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T08:57:44.0760026Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T08:57:44.0761151Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T08:57:44.0762646Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T08:57:44.0763666Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T08:57:44.0764585Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T08:57:44.0765639Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T08:57:44.0766755Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T08:57:44.0768392Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T08:57:44.0769546Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T08:57:44.0770588Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T08:57:44.0772192Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T08:57:44.0773346Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T08:57:44.0774622Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T08:57:44.0775808Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T08:57:44.0776917Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T08:57:44.0778094Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T08:57:44.0779468Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T08:57:44.0780563Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T08:57:44.0781692Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T08:57:44.0782854Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T08:57:44.0784063Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T08:57:44.0785240Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T08:57:44.0786180Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T08:57:44.0787376Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T08:57:44.0788589Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T08:57:44.0789742Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T08:57:44.0790937Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T08:57:44.0792017Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T08:57:44.0793312Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T08:57:44.0794415Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T08:57:44.0795653Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T08:57:44.0796737Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T08:57:44.0797831Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T08:57:44.0798927Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T08:57:44.0800008Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T08:57:44.0801115Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T08:57:44.0802391Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T08:57:44.0803480Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T08:57:44.0804549Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T08:57:44.0805720Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T08:57:44.0807209Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T08:57:44.0808205Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T08:57:44.0809071Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T08:57:44.0810136Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T08:57:44.0811322Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T08:57:44.0812477Z * [new branch] module-shim -> origin/module-shim 2025-12-04T08:57:44.0813942Z * [new branch] move_config -> origin/move_config 2025-12-04T08:57:44.0815614Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T08:57:44.0817468Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T08:57:44.0819092Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T08:57:44.0820128Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T08:57:44.0821342Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T08:57:44.0822452Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T08:57:44.0823891Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T08:57:44.0825568Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T08:57:44.0826514Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T08:57:44.0827591Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T08:57:44.0828646Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T08:57:44.0829797Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T08:57:44.0830760Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T08:57:44.0831765Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T08:57:44.0832792Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T08:57:44.0833965Z * [new branch] nightly -> origin/nightly 2025-12-04T08:57:44.0835722Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T08:57:44.0836776Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T08:57:44.0837834Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T08:57:44.0839152Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T08:57:44.0840547Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T08:57:44.0842035Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T08:57:44.0843020Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T08:57:44.0844806Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T08:57:44.0845662Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T08:57:44.0846826Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T08:57:44.0847968Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T08:57:44.0849538Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T08:57:44.0850632Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T08:57:44.0851797Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T08:57:44.0854027Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T08:57:44.0855149Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T08:57:44.0856585Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T08:57:44.0858030Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T08:57:44.0859231Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T08:57:44.0860660Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T08:57:44.0861796Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T08:57:44.0862976Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T08:57:44.0864198Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T08:57:44.0865409Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T08:57:44.0866524Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T08:57:44.0867604Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T08:57:44.0868713Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T08:57:44.0869769Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T08:57:44.0870825Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T08:57:44.0872299Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T08:57:44.0873860Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T08:57:44.0874832Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T08:57:44.0877357Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T08:57:44.0878300Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T08:57:44.0881121Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T08:57:44.0882415Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T08:57:44.0883764Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T08:57:44.0884958Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T08:57:44.0886317Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T08:57:44.0887438Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T08:57:44.0888709Z * [new branch] pca2 -> origin/pca2 2025-12-04T08:57:44.0890027Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T08:57:44.0891270Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T08:57:44.0892602Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T08:57:44.0894366Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T08:57:44.0895688Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T08:57:44.0896871Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T08:57:44.0897892Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T08:57:44.0898936Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T08:57:44.0900172Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T08:57:44.0901596Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T08:57:44.0902985Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T08:57:44.0904269Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T08:57:44.0905293Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T08:57:44.0906530Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T08:57:44.0907680Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T08:57:44.0908686Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T08:57:44.0909924Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T08:57:44.0911147Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T08:57:44.0912259Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T08:57:44.0913278Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T08:57:44.0914333Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T08:57:44.0915460Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T08:57:44.0916519Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T08:57:44.0917597Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T08:57:44.0918840Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T08:57:44.0919974Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T08:57:44.0921152Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T08:57:44.0922442Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T08:57:44.0923471Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T08:57:44.0924550Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T08:57:44.0925699Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T08:57:44.0926779Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T08:57:44.0928030Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T08:57:44.0929273Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T08:57:44.0930571Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T08:57:44.0931548Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T08:57:44.0932772Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T08:57:44.0934266Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T08:57:44.0935943Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T08:57:44.0936851Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T08:57:44.0938058Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T08:57:44.0939180Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T08:57:44.0940227Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T08:57:44.0941324Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T08:57:44.0942691Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T08:57:44.0943722Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T08:57:44.0944774Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T08:57:44.0946005Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T08:57:44.0947142Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T08:57:44.0948299Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T08:57:44.0949333Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T08:57:44.0950398Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T08:57:44.0952052Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T08:57:44.0952923Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T08:57:44.0954084Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T08:57:44.0955248Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T08:57:44.0956843Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T08:57:44.0957970Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T08:57:44.0959585Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T08:57:44.0960727Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T08:57:44.0962491Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T08:57:44.0964169Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T08:57:44.0965378Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T08:57:44.0967297Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T08:57:44.0968519Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T08:57:44.0970036Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T08:57:44.0971605Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T08:57:44.0972788Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T08:57:44.0974373Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T08:57:44.0975394Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T08:57:44.0976461Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T08:57:44.0977394Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T08:57:44.0978538Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T08:57:44.0980086Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T08:57:44.0981176Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T08:57:44.0982573Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T08:57:44.0983742Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T08:57:44.0985151Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T08:57:44.0986350Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T08:57:44.0987619Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T08:57:44.0989087Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T08:57:44.0990754Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T08:57:44.0992349Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T08:57:44.0993629Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T08:57:44.0994921Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T08:57:44.0996073Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T08:57:44.0997370Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T08:57:44.0998605Z * [new branch] release_notes -> origin/release_notes 2025-12-04T08:57:44.0999773Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T08:57:44.1001306Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T08:57:44.1002334Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T08:57:44.1003320Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T08:57:44.1004573Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T08:57:44.1006799Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T08:57:44.1009310Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T08:57:44.1011475Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T08:57:44.1013953Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T08:57:44.1015436Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T08:57:44.1016721Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T08:57:44.1017722Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T08:57:44.1018849Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T08:57:44.1020529Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T08:57:44.1021713Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T08:57:44.1022718Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T08:57:44.1023729Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T08:57:44.1025065Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T08:57:44.1026520Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T08:57:44.1028378Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T08:57:44.1029260Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T08:57:44.1030775Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T08:57:44.1031716Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T08:57:44.1032861Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T08:57:44.1033891Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T08:57:44.1035131Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T08:57:44.1037069Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T08:57:44.1038127Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T08:57:44.1039840Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T08:57:44.1040820Z * [new branch] save -> origin/save 2025-12-04T08:57:44.1042050Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T08:57:44.1043208Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T08:57:44.1044743Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T08:57:44.1045998Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T08:57:44.1047517Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T08:57:44.1048729Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T08:57:44.1049916Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T08:57:44.1051096Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T08:57:44.1052571Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T08:57:44.1054115Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T08:57:44.1055572Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T08:57:44.1056718Z * [new branch] suo -> origin/suo 2025-12-04T08:57:44.1057981Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T08:57:44.1059261Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T08:57:44.1060466Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T08:57:44.1061601Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T08:57:44.1062782Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T08:57:44.1063964Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T08:57:44.1065294Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T08:57:44.1066585Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T08:57:44.1067724Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T08:57:44.1069018Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T08:57:44.1070087Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T08:57:44.1071212Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T08:57:44.1072871Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T08:57:44.1073983Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T08:57:44.1075150Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T08:57:44.1076290Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T08:57:44.1077446Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T08:57:44.1078827Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T08:57:44.1082986Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T08:57:44.1084322Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T08:57:44.1085485Z * [new branch] test-old -> origin/test-old 2025-12-04T08:57:44.1087102Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T08:57:44.1088797Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T08:57:44.1089900Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T08:57:44.1090885Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T08:57:44.1092244Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T08:57:44.1093900Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T08:57:44.1095375Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T08:57:44.1096448Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T08:57:44.1097686Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T08:57:44.1098803Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T08:57:44.1099917Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T08:57:44.1101036Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T08:57:44.1102195Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T08:57:44.1103251Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T08:57:44.1104578Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T08:57:44.1107416Z * [new branch] tmp -> origin/tmp 2025-12-04T08:57:44.1107962Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T08:57:44.1108740Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T08:57:44.1109558Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T08:57:44.1110807Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T08:57:44.1111892Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T08:57:44.1113035Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T08:57:44.1114343Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T08:57:44.1115425Z * [new branch] type_dec -> origin/type_dec 2025-12-04T08:57:44.1116668Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T08:57:44.1118457Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T08:57:44.1119539Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T08:57:44.1120835Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T08:57:44.1121886Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T08:57:44.1122942Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T08:57:44.1124242Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T08:57:44.1125993Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T08:57:44.1127921Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T08:57:44.1128922Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T08:57:44.1130113Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T08:57:44.1131151Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T08:57:44.1132153Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T08:57:44.1133869Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T08:57:44.1135025Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T08:57:44.1136817Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T08:57:44.1138286Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T08:57:44.1139382Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T08:57:44.1140621Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T08:57:44.1141817Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T08:57:44.1143077Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T08:57:44.1144316Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T08:57:44.1145613Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T08:57:44.1146786Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T08:57:44.1148560Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T08:57:44.1149557Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T08:57:44.1150806Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T08:57:44.1151980Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T08:57:44.1153169Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T08:57:44.1154538Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T08:57:44.1155774Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T08:57:44.1157039Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T08:57:44.1158493Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T08:57:44.1159721Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T08:57:44.1160966Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T08:57:44.1162252Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T08:57:44.1163621Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T08:57:44.1164814Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T08:57:44.1165943Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T08:57:44.1167143Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T08:57:44.1168254Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T08:57:44.1169636Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T08:57:44.1171353Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T08:57:44.1172317Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T08:57:44.1173894Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T08:57:44.1175308Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T08:57:44.1176473Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T08:57:44.1178060Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T08:57:44.1179911Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T08:57:44.1181008Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T08:57:44.1182210Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T08:57:44.1183250Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T08:57:44.1184306Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T08:57:44.1185845Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T08:57:44.1187190Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T08:57:44.1188328Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T08:57:44.1189528Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T08:57:44.1191188Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T08:57:44.1192232Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T08:57:44.1193524Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T08:57:44.1194382Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T08:57:44.1195411Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T08:57:44.1196403Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T08:57:44.1197404Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T08:57:44.1198736Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T08:57:44.1200088Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T08:57:44.1201152Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T08:57:44.1202222Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T08:57:44.1203188Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T08:57:44.1204269Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T08:57:44.1205315Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T08:57:44.1206990Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T08:57:44.1208030Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T08:57:44.1209169Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T08:57:44.1210153Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T08:57:44.1211220Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T08:57:44.1212296Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T08:57:44.1213652Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T08:57:44.1215578Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T08:57:44.1216871Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:57:44.1218375Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:57:44.1219306Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T08:57:44.1219933Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T08:57:44.1221149Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T08:57:44.1222859Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T08:57:44.1223846Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T08:57:44.1224948Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T08:57:44.1226499Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T08:57:44.1227665Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T08:57:44.1228763Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T08:57:44.1230332Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T08:57:44.1231563Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T08:57:44.1232559Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T08:57:44.1234035Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T08:57:44.1235381Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T08:57:44.1236700Z * [new branch] zb2p -> origin/zb2p 2025-12-04T08:57:44.1237660Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T08:57:44.1239490Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T08:57:44.1240578Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T08:57:44.1241546Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T08:57:44.1243248Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T08:57:44.1245035Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T08:57:44.1246037Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T08:57:44.1247297Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T08:57:44.1248518Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T08:57:44.1249625Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T08:57:44.1251072Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T08:57:44.1252152Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T08:57:44.1253361Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T08:57:44.1254931Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T08:57:44.1256167Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T08:57:44.1257858Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T08:57:44.1259429Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T08:57:44.1278835Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T08:57:44.1279706Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T08:57:44.1280320Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T08:57:44.1280888Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T08:57:44.1281546Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T08:57:44.1282547Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-12-04T08:57:44.1283423Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-12-04T08:57:44.1283952Z * [new tag] ciflow/b200/115316 -> ciflow/b200/115316 2025-12-04T08:57:44.1284469Z * [new tag] ciflow/b200/160685 -> ciflow/b200/160685 2025-12-04T08:57:44.1284972Z * [new tag] ciflow/b200/161607 -> ciflow/b200/161607 2025-12-04T08:57:44.1285462Z * [new tag] ciflow/b200/161938 -> ciflow/b200/161938 2025-12-04T08:57:44.1285960Z * [new tag] ciflow/b200/167207 -> ciflow/b200/167207 2025-12-04T08:57:44.1286459Z * [new tag] ciflow/b200/167989 -> ciflow/b200/167989 2025-12-04T08:57:44.1286949Z * [new tag] ciflow/b200/168096 -> ciflow/b200/168096 2025-12-04T08:57:44.1287451Z * [new tag] ciflow/b200/168175 -> ciflow/b200/168175 2025-12-04T08:57:44.1287955Z * [new tag] ciflow/b200/168195 -> ciflow/b200/168195 2025-12-04T08:57:44.1288655Z * [new tag] ciflow/b200/169200 -> ciflow/b200/169200 2025-12-04T08:57:44.1289233Z * [new tag] ciflow/b200/169216 -> ciflow/b200/169216 2025-12-04T08:57:44.1289736Z * [new tag] ciflow/b200/169380 -> ciflow/b200/169380 2025-12-04T08:57:44.1290243Z * [new tag] ciflow/b200/169412 -> ciflow/b200/169412 2025-12-04T08:57:44.1290755Z * [new tag] ciflow/b200/169470 -> ciflow/b200/169470 2025-12-04T08:57:44.1291253Z * [new tag] ciflow/b200/169471 -> ciflow/b200/169471 2025-12-04T08:57:44.1291851Z * [new tag] ciflow/b200/169472 -> ciflow/b200/169472 2025-12-04T08:57:44.1292329Z * [new tag] ciflow/b200/169514 -> ciflow/b200/169514 2025-12-04T08:57:44.1292813Z * [new tag] ciflow/b200/169517 -> ciflow/b200/169517 2025-12-04T08:57:44.1293407Z * [new tag] ciflow/binaries/165922 -> ciflow/binaries/165922 2025-12-04T08:57:44.1294139Z * [new tag] ciflow/binaries/169510 -> ciflow/binaries/169510 2025-12-04T08:57:44.1294745Z * [new tag] ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994 2025-12-04T08:57:44.1295364Z * [new tag] ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829 2025-12-04T08:57:44.1295993Z * [new tag] ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972 2025-12-04T08:57:44.1296618Z * [new tag] ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981 2025-12-04T08:57:44.1297201Z * [new tag] ciflow/dynamo/167695 -> ciflow/dynamo/167695 2025-12-04T08:57:44.1297727Z * [new tag] ciflow/dynamo/168096 -> ciflow/dynamo/168096 2025-12-04T08:57:44.1298258Z * [new tag] ciflow/dynamo/169525 -> ciflow/dynamo/169525 2025-12-04T08:57:44.1298908Z * [new tag] ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938 2025-12-04T08:57:44.1299673Z * [new tag] ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940 2025-12-04T08:57:44.1300385Z * [new tag] ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923 2025-12-04T08:57:44.1301024Z * [new tag] ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552 2025-12-04T08:57:44.1301626Z * [new tag] ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129 2025-12-04T08:57:44.1302224Z * [new tag] ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917 2025-12-04T08:57:44.1302809Z * [new tag] ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156 2025-12-04T08:57:44.1303410Z * [new tag] ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200 2025-12-04T08:57:44.1304006Z * [new tag] ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216 2025-12-04T08:57:44.1304592Z * [new tag] ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338 2025-12-04T08:57:44.1305197Z * [new tag] ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355 2025-12-04T08:57:44.1305884Z * [new tag] ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543 2025-12-04T08:57:44.1306423Z * [new tag] ciflow/h100/115316 -> ciflow/h100/115316 2025-12-04T08:57:44.1306902Z * [new tag] ciflow/h100/160685 -> ciflow/h100/160685 2025-12-04T08:57:44.1307391Z * [new tag] ciflow/h100/160729 -> ciflow/h100/160729 2025-12-04T08:57:44.1307880Z * [new tag] ciflow/h100/161607 -> ciflow/h100/161607 2025-12-04T08:57:44.1308368Z * [new tag] ciflow/h100/161938 -> ciflow/h100/161938 2025-12-04T08:57:44.1308847Z * [new tag] ciflow/h100/167207 -> ciflow/h100/167207 2025-12-04T08:57:44.1309408Z * [new tag] ciflow/h100/167989 -> ciflow/h100/167989 2025-12-04T08:57:44.1309950Z * [new tag] ciflow/h100/168096 -> ciflow/h100/168096 2025-12-04T08:57:44.1310432Z * [new tag] ciflow/h100/168175 -> ciflow/h100/168175 2025-12-04T08:57:44.1310911Z * [new tag] ciflow/h100/168195 -> ciflow/h100/168195 2025-12-04T08:57:44.1311398Z * [new tag] ciflow/h100/168980 -> ciflow/h100/168980 2025-12-04T08:57:44.1311894Z * [new tag] ciflow/h100/169200 -> ciflow/h100/169200 2025-12-04T08:57:44.1312366Z * [new tag] ciflow/h100/169216 -> ciflow/h100/169216 2025-12-04T08:57:44.1312853Z * [new tag] ciflow/h100/169380 -> ciflow/h100/169380 2025-12-04T08:57:44.1313339Z * [new tag] ciflow/h100/169412 -> ciflow/h100/169412 2025-12-04T08:57:44.1313829Z * [new tag] ciflow/h100/169470 -> ciflow/h100/169470 2025-12-04T08:57:44.1314311Z * [new tag] ciflow/h100/169471 -> ciflow/h100/169471 2025-12-04T08:57:44.1314809Z * [new tag] ciflow/h100/169472 -> ciflow/h100/169472 2025-12-04T08:57:44.1315488Z * [new tag] ciflow/h100/169514 -> ciflow/h100/169514 2025-12-04T08:57:44.1316356Z * [new tag] ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096 2025-12-04T08:57:44.1317560Z * [new tag] ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096 2025-12-04T08:57:44.1318489Z * [new tag] ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165 2025-12-04T08:57:44.1319334Z * [new tag] ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096 2025-12-04T08:57:44.1320148Z * [new tag] ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096 2025-12-04T08:57:44.1321167Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073 2025-12-04T08:57:44.1322287Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096 2025-12-04T08:57:44.1323404Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024 2025-12-04T08:57:44.1324516Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024 2025-12-04T08:57:44.1325513Z * [new tag] ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096 2025-12-04T08:57:44.1326314Z * [new tag] ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096 2025-12-04T08:57:44.1327002Z * [new tag] ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024 2025-12-04T08:57:44.1327700Z * [new tag] ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425 2025-12-04T08:57:44.1328412Z * [new tag] ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545 2025-12-04T08:57:44.1329173Z * [new tag] ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997 2025-12-04T08:57:44.1329879Z * [new tag] ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096 2025-12-04T08:57:44.1330597Z * [new tag] ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063 2025-12-04T08:57:44.1331316Z * [new tag] ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425 2025-12-04T08:57:44.1332035Z * [new tag] ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545 2025-12-04T08:57:44.1332873Z * [new tag] ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096 2025-12-04T08:57:44.1333853Z * [new tag] ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063 2025-12-04T08:57:44.1334660Z * [new tag] ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425 2025-12-04T08:57:44.1335342Z * [new tag] ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052 2025-12-04T08:57:44.1335970Z * [new tag] ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971 2025-12-04T08:57:44.1336691Z * [new tag] ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096 2025-12-04T08:57:44.1337463Z * [new tag] ciflow/inductor/144542 -> ciflow/inductor/144542 2025-12-04T08:57:44.1338201Z * [new tag] ciflow/inductor/146506 -> ciflow/inductor/146506 2025-12-04T08:57:44.1338924Z * [new tag] ciflow/inductor/147990 -> ciflow/inductor/147990 2025-12-04T08:57:44.1339791Z * [new tag] ciflow/inductor/148294 -> ciflow/inductor/148294 2025-12-04T08:57:44.1340501Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-12-04T08:57:44.1341235Z * [new tag] ciflow/inductor/157149 -> ciflow/inductor/157149 2025-12-04T08:57:44.1341950Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-12-04T08:57:44.1342651Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-12-04T08:57:44.1343354Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-12-04T08:57:44.1344072Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-12-04T08:57:44.1344836Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-12-04T08:57:44.1345924Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-12-04T08:57:44.1346900Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-12-04T08:57:44.1347931Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-12-04T08:57:44.1348768Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-12-04T08:57:44.1349510Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-12-04T08:57:44.1350160Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-12-04T08:57:44.1351310Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-12-04T08:57:44.1352084Z * [new tag] ciflow/inductor/162795 -> ciflow/inductor/162795 2025-12-04T08:57:44.1353005Z * [new tag] ciflow/inductor/163245 -> ciflow/inductor/163245 2025-12-04T08:57:44.1353752Z * [new tag] ciflow/inductor/163335 -> ciflow/inductor/163335 2025-12-04T08:57:44.1354493Z * [new tag] ciflow/inductor/163503 -> ciflow/inductor/163503 2025-12-04T08:57:44.1355251Z * [new tag] ciflow/inductor/163942 -> ciflow/inductor/163942 2025-12-04T08:57:44.1356104Z * [new tag] ciflow/inductor/165270 -> ciflow/inductor/165270 2025-12-04T08:57:44.1356852Z * [new tag] ciflow/inductor/165274 -> ciflow/inductor/165274 2025-12-04T08:57:44.1357598Z * [new tag] ciflow/inductor/165322 -> ciflow/inductor/165322 2025-12-04T08:57:44.1358354Z * [new tag] ciflow/inductor/165597 -> ciflow/inductor/165597 2025-12-04T08:57:44.1359092Z * [new tag] ciflow/inductor/166063 -> ciflow/inductor/166063 2025-12-04T08:57:44.1359857Z * [new tag] ciflow/inductor/166075 -> ciflow/inductor/166075 2025-12-04T08:57:44.1360595Z * [new tag] ciflow/inductor/166165 -> ciflow/inductor/166165 2025-12-04T08:57:44.1361738Z * [new tag] ciflow/inductor/166254 -> ciflow/inductor/166254 2025-12-04T08:57:44.1362354Z * [new tag] ciflow/inductor/166483 -> ciflow/inductor/166483 2025-12-04T08:57:44.1363079Z * [new tag] ciflow/inductor/166494 -> ciflow/inductor/166494 2025-12-04T08:57:44.1363830Z * [new tag] ciflow/inductor/166545 -> ciflow/inductor/166545 2025-12-04T08:57:44.1364554Z * [new tag] ciflow/inductor/166788 -> ciflow/inductor/166788 2025-12-04T08:57:44.1365428Z * [new tag] ciflow/inductor/166846 -> ciflow/inductor/166846 2025-12-04T08:57:44.1366163Z * [new tag] ciflow/inductor/167300 -> ciflow/inductor/167300 2025-12-04T08:57:44.1366931Z * [new tag] ciflow/inductor/167407 -> ciflow/inductor/167407 2025-12-04T08:57:44.1367765Z * [new tag] ciflow/inductor/167536 -> ciflow/inductor/167536 2025-12-04T08:57:44.1368511Z * [new tag] ciflow/inductor/167552 -> ciflow/inductor/167552 2025-12-04T08:57:44.1369274Z * [new tag] ciflow/inductor/167555 -> ciflow/inductor/167555 2025-12-04T08:57:44.1370135Z * [new tag] ciflow/inductor/167583 -> ciflow/inductor/167583 2025-12-04T08:57:44.1370870Z * [new tag] ciflow/inductor/167599 -> ciflow/inductor/167599 2025-12-04T08:57:44.1371593Z * [new tag] ciflow/inductor/167647 -> ciflow/inductor/167647 2025-12-04T08:57:44.1372340Z * [new tag] ciflow/inductor/167677 -> ciflow/inductor/167677 2025-12-04T08:57:44.1373263Z * [new tag] ciflow/inductor/167680 -> ciflow/inductor/167680 2025-12-04T08:57:44.1374299Z * [new tag] ciflow/inductor/167695 -> ciflow/inductor/167695 2025-12-04T08:57:44.1375002Z * [new tag] ciflow/inductor/167742 -> ciflow/inductor/167742 2025-12-04T08:57:44.1375772Z * [new tag] ciflow/inductor/167768 -> ciflow/inductor/167768 2025-12-04T08:57:44.1376822Z * [new tag] ciflow/inductor/167773 -> ciflow/inductor/167773 2025-12-04T08:57:44.1377595Z * [new tag] ciflow/inductor/167781 -> ciflow/inductor/167781 2025-12-04T08:57:44.1378376Z * [new tag] ciflow/inductor/167880 -> ciflow/inductor/167880 2025-12-04T08:57:44.1379887Z * [new tag] ciflow/inductor/167887 -> ciflow/inductor/167887 2025-12-04T08:57:44.1380469Z * [new tag] ciflow/inductor/167972 -> ciflow/inductor/167972 2025-12-04T08:57:44.1381058Z * [new tag] ciflow/inductor/167989 -> ciflow/inductor/167989 2025-12-04T08:57:44.1381824Z * [new tag] ciflow/inductor/168002 -> ciflow/inductor/168002 2025-12-04T08:57:44.1382588Z * [new tag] ciflow/inductor/168050 -> ciflow/inductor/168050 2025-12-04T08:57:44.1383370Z * [new tag] ciflow/inductor/168051 -> ciflow/inductor/168051 2025-12-04T08:57:44.1384114Z * [new tag] ciflow/inductor/168052 -> ciflow/inductor/168052 2025-12-04T08:57:44.1384894Z * [new tag] ciflow/inductor/168073 -> ciflow/inductor/168073 2025-12-04T08:57:44.1385655Z * [new tag] ciflow/inductor/168096 -> ciflow/inductor/168096 2025-12-04T08:57:44.1386478Z * [new tag] ciflow/inductor/168114 -> ciflow/inductor/168114 2025-12-04T08:57:44.1387192Z * [new tag] ciflow/inductor/168115 -> ciflow/inductor/168115 2025-12-04T08:57:44.1387992Z * [new tag] ciflow/inductor/168127 -> ciflow/inductor/168127 2025-12-04T08:57:44.1388681Z * [new tag] ciflow/inductor/168129 -> ciflow/inductor/168129 2025-12-04T08:57:44.1389472Z * [new tag] ciflow/inductor/168157 -> ciflow/inductor/168157 2025-12-04T08:57:44.1390193Z * [new tag] ciflow/inductor/168175 -> ciflow/inductor/168175 2025-12-04T08:57:44.1391216Z * [new tag] ciflow/inductor/168185 -> ciflow/inductor/168185 2025-12-04T08:57:44.1391828Z * [new tag] ciflow/inductor/168195 -> ciflow/inductor/168195 2025-12-04T08:57:44.1393003Z * [new tag] ciflow/inductor/168209 -> ciflow/inductor/168209 2025-12-04T08:57:44.1393321Z * [new tag] ciflow/inductor/168266 -> ciflow/inductor/168266 2025-12-04T08:57:44.1394084Z * [new tag] ciflow/inductor/168316 -> ciflow/inductor/168316 2025-12-04T08:57:44.1395021Z * [new tag] ciflow/inductor/168326 -> ciflow/inductor/168326 2025-12-04T08:57:44.1395714Z * [new tag] ciflow/inductor/168368 -> ciflow/inductor/168368 2025-12-04T08:57:44.1396414Z * [new tag] ciflow/inductor/168894 -> ciflow/inductor/168894 2025-12-04T08:57:44.1397161Z * [new tag] ciflow/inductor/168934 -> ciflow/inductor/168934 2025-12-04T08:57:44.1397920Z * [new tag] ciflow/inductor/168939 -> ciflow/inductor/168939 2025-12-04T08:57:44.1398811Z * [new tag] ciflow/inductor/168946 -> ciflow/inductor/168946 2025-12-04T08:57:44.1399513Z * [new tag] ciflow/inductor/168950 -> ciflow/inductor/168950 2025-12-04T08:57:44.1400235Z * [new tag] ciflow/inductor/168951 -> ciflow/inductor/168951 2025-12-04T08:57:44.1400994Z * [new tag] ciflow/inductor/168952 -> ciflow/inductor/168952 2025-12-04T08:57:44.1402233Z * [new tag] ciflow/inductor/168955 -> ciflow/inductor/168955 2025-12-04T08:57:44.1402917Z * [new tag] ciflow/inductor/168971 -> ciflow/inductor/168971 2025-12-04T08:57:44.1403667Z * [new tag] ciflow/inductor/168979 -> ciflow/inductor/168979 2025-12-04T08:57:44.1404407Z * [new tag] ciflow/inductor/168980 -> ciflow/inductor/168980 2025-12-04T08:57:44.1405352Z * [new tag] ciflow/inductor/168983 -> ciflow/inductor/168983 2025-12-04T08:57:44.1406036Z * [new tag] ciflow/inductor/169006 -> ciflow/inductor/169006 2025-12-04T08:57:44.1406772Z * [new tag] ciflow/inductor/169023 -> ciflow/inductor/169023 2025-12-04T08:57:44.1407496Z * [new tag] ciflow/inductor/169024 -> ciflow/inductor/169024 2025-12-04T08:57:44.1408511Z * [new tag] ciflow/inductor/169025 -> ciflow/inductor/169025 2025-12-04T08:57:44.1409283Z * [new tag] ciflow/inductor/169066 -> ciflow/inductor/169066 2025-12-04T08:57:44.1410025Z * [new tag] ciflow/inductor/169091 -> ciflow/inductor/169091 2025-12-04T08:57:44.1410781Z * [new tag] ciflow/inductor/169102 -> ciflow/inductor/169102 2025-12-04T08:57:44.1411484Z * [new tag] ciflow/inductor/169103 -> ciflow/inductor/169103 2025-12-04T08:57:44.1412230Z * [new tag] ciflow/inductor/169121 -> ciflow/inductor/169121 2025-12-04T08:57:44.1413063Z * [new tag] ciflow/inductor/169134 -> ciflow/inductor/169134 2025-12-04T08:57:44.1414070Z * [new tag] ciflow/inductor/169135 -> ciflow/inductor/169135 2025-12-04T08:57:44.1414847Z * [new tag] ciflow/inductor/169141 -> ciflow/inductor/169141 2025-12-04T08:57:44.1415594Z * [new tag] ciflow/inductor/169151 -> ciflow/inductor/169151 2025-12-04T08:57:44.1416578Z * [new tag] ciflow/inductor/169161 -> ciflow/inductor/169161 2025-12-04T08:57:44.1417279Z * [new tag] ciflow/inductor/169167 -> ciflow/inductor/169167 2025-12-04T08:57:44.1418294Z * [new tag] ciflow/inductor/169177 -> ciflow/inductor/169177 2025-12-04T08:57:44.1419152Z * [new tag] ciflow/inductor/169185 -> ciflow/inductor/169185 2025-12-04T08:57:44.1419920Z * [new tag] ciflow/inductor/169196 -> ciflow/inductor/169196 2025-12-04T08:57:44.1420830Z * [new tag] ciflow/inductor/169200 -> ciflow/inductor/169200 2025-12-04T08:57:44.1421449Z * [new tag] ciflow/inductor/169204 -> ciflow/inductor/169204 2025-12-04T08:57:44.1422212Z * [new tag] ciflow/inductor/169216 -> ciflow/inductor/169216 2025-12-04T08:57:44.1422952Z * [new tag] ciflow/inductor/169219 -> ciflow/inductor/169219 2025-12-04T08:57:44.1423710Z * [new tag] ciflow/inductor/169220 -> ciflow/inductor/169220 2025-12-04T08:57:44.1424832Z * [new tag] ciflow/inductor/169230 -> ciflow/inductor/169230 2025-12-04T08:57:44.1425626Z * [new tag] ciflow/inductor/169242 -> ciflow/inductor/169242 2025-12-04T08:57:44.1426385Z * [new tag] ciflow/inductor/169245 -> ciflow/inductor/169245 2025-12-04T08:57:44.1427311Z * [new tag] ciflow/inductor/169260 -> ciflow/inductor/169260 2025-12-04T08:57:44.1428021Z * [new tag] ciflow/inductor/169282 -> ciflow/inductor/169282 2025-12-04T08:57:44.1428748Z * [new tag] ciflow/inductor/169286 -> ciflow/inductor/169286 2025-12-04T08:57:44.1429494Z * [new tag] ciflow/inductor/169299 -> ciflow/inductor/169299 2025-12-04T08:57:44.1430435Z * [new tag] ciflow/inductor/169304 -> ciflow/inductor/169304 2025-12-04T08:57:44.1431593Z * [new tag] ciflow/inductor/169305 -> ciflow/inductor/169305 2025-12-04T08:57:44.1432307Z * [new tag] ciflow/inductor/169308 -> ciflow/inductor/169308 2025-12-04T08:57:44.1433028Z * [new tag] ciflow/inductor/169319 -> ciflow/inductor/169319 2025-12-04T08:57:44.1433790Z * [new tag] ciflow/inductor/169326 -> ciflow/inductor/169326 2025-12-04T08:57:44.1434518Z * [new tag] ciflow/inductor/169332 -> ciflow/inductor/169332 2025-12-04T08:57:44.1435246Z * [new tag] ciflow/inductor/169333 -> ciflow/inductor/169333 2025-12-04T08:57:44.1436322Z * [new tag] ciflow/inductor/169336 -> ciflow/inductor/169336 2025-12-04T08:57:44.1436994Z * [new tag] ciflow/inductor/169340 -> ciflow/inductor/169340 2025-12-04T08:57:44.1437723Z * [new tag] ciflow/inductor/169341 -> ciflow/inductor/169341 2025-12-04T08:57:44.1438500Z * [new tag] ciflow/inductor/169343 -> ciflow/inductor/169343 2025-12-04T08:57:44.1439253Z * [new tag] ciflow/inductor/169346 -> ciflow/inductor/169346 2025-12-04T08:57:44.1440205Z * [new tag] ciflow/inductor/169348 -> ciflow/inductor/169348 2025-12-04T08:57:44.1440985Z * [new tag] ciflow/inductor/169350 -> ciflow/inductor/169350 2025-12-04T08:57:44.1441743Z * [new tag] ciflow/inductor/169355 -> ciflow/inductor/169355 2025-12-04T08:57:44.1442486Z * [new tag] ciflow/inductor/169370 -> ciflow/inductor/169370 2025-12-04T08:57:44.1443640Z * [new tag] ciflow/inductor/169375 -> ciflow/inductor/169375 2025-12-04T08:57:44.1444308Z * [new tag] ciflow/inductor/169389 -> ciflow/inductor/169389 2025-12-04T08:57:44.1445052Z * [new tag] ciflow/inductor/169391 -> ciflow/inductor/169391 2025-12-04T08:57:44.1445805Z * [new tag] ciflow/inductor/169393 -> ciflow/inductor/169393 2025-12-04T08:57:44.1446555Z * [new tag] ciflow/inductor/169399 -> ciflow/inductor/169399 2025-12-04T08:57:44.1447496Z * [new tag] ciflow/inductor/169400 -> ciflow/inductor/169400 2025-12-04T08:57:44.1448166Z * [new tag] ciflow/inductor/169415 -> ciflow/inductor/169415 2025-12-04T08:57:44.1448955Z * [new tag] ciflow/inductor/169417 -> ciflow/inductor/169417 2025-12-04T08:57:44.1449753Z * [new tag] ciflow/inductor/169418 -> ciflow/inductor/169418 2025-12-04T08:57:44.1450814Z * [new tag] ciflow/inductor/169430 -> ciflow/inductor/169430 2025-12-04T08:57:44.1451508Z * [new tag] ciflow/inductor/169432 -> ciflow/inductor/169432 2025-12-04T08:57:44.1452226Z * [new tag] ciflow/inductor/169436 -> ciflow/inductor/169436 2025-12-04T08:57:44.1453258Z * [new tag] ciflow/inductor/169437 -> ciflow/inductor/169437 2025-12-04T08:57:44.1454688Z * [new tag] ciflow/inductor/169438 -> ciflow/inductor/169438 2025-12-04T08:57:44.1455403Z * [new tag] ciflow/inductor/169441 -> ciflow/inductor/169441 2025-12-04T08:57:44.1456165Z * [new tag] ciflow/inductor/169446 -> ciflow/inductor/169446 2025-12-04T08:57:44.1457115Z * [new tag] ciflow/inductor/169447 -> ciflow/inductor/169447 2025-12-04T08:57:44.1457900Z * [new tag] ciflow/inductor/169452 -> ciflow/inductor/169452 2025-12-04T08:57:44.1458834Z * [new tag] ciflow/inductor/169455 -> ciflow/inductor/169455 2025-12-04T08:57:44.1459562Z * [new tag] ciflow/inductor/169459 -> ciflow/inductor/169459 2025-12-04T08:57:44.1460492Z * [new tag] ciflow/inductor/169463 -> ciflow/inductor/169463 2025-12-04T08:57:44.1461337Z * [new tag] ciflow/inductor/169476 -> ciflow/inductor/169476 2025-12-04T08:57:44.1462155Z * [new tag] ciflow/inductor/169485 -> ciflow/inductor/169485 2025-12-04T08:57:44.1462911Z * [new tag] ciflow/inductor/169493 -> ciflow/inductor/169493 2025-12-04T08:57:44.1463684Z * [new tag] ciflow/inductor/169496 -> ciflow/inductor/169496 2025-12-04T08:57:44.1464460Z * [new tag] ciflow/inductor/169497 -> ciflow/inductor/169497 2025-12-04T08:57:44.1465249Z * [new tag] ciflow/inductor/169503 -> ciflow/inductor/169503 2025-12-04T08:57:44.1466099Z * [new tag] ciflow/inductor/169504 -> ciflow/inductor/169504 2025-12-04T08:57:44.1467269Z * [new tag] ciflow/inductor/169505 -> ciflow/inductor/169505 2025-12-04T08:57:44.1468485Z * [new tag] ciflow/inductor/169508 -> ciflow/inductor/169508 2025-12-04T08:57:44.1469174Z * [new tag] ciflow/inductor/169509 -> ciflow/inductor/169509 2025-12-04T08:57:44.1469950Z * [new tag] ciflow/inductor/169513 -> ciflow/inductor/169513 2025-12-04T08:57:44.1470735Z * [new tag] ciflow/inductor/169514 -> ciflow/inductor/169514 2025-12-04T08:57:44.1471517Z * [new tag] ciflow/inductor/169515 -> ciflow/inductor/169515 2025-12-04T08:57:44.1472257Z * [new tag] ciflow/inductor/169517 -> ciflow/inductor/169517 2025-12-04T08:57:44.1473000Z * [new tag] ciflow/inductor/169519 -> ciflow/inductor/169519 2025-12-04T08:57:44.1473761Z * [new tag] ciflow/inductor/169520 -> ciflow/inductor/169520 2025-12-04T08:57:44.1474490Z * [new tag] ciflow/inductor/169521 -> ciflow/inductor/169521 2025-12-04T08:57:44.1475242Z * [new tag] ciflow/inductor/169524 -> ciflow/inductor/169524 2025-12-04T08:57:44.1475972Z * [new tag] ciflow/inductor/169527 -> ciflow/inductor/169527 2025-12-04T08:57:44.1476716Z * [new tag] ciflow/inductor/169528 -> ciflow/inductor/169528 2025-12-04T08:57:44.1477736Z * [new tag] ciflow/inductor/169532 -> ciflow/inductor/169532 2025-12-04T08:57:44.1478418Z * [new tag] ciflow/inductor/169535 -> ciflow/inductor/169535 2025-12-04T08:57:44.1482692Z * [new tag] ciflow/inductor/169536 -> ciflow/inductor/169536 2025-12-04T08:57:44.1483509Z * [new tag] ciflow/inductor/169547 -> ciflow/inductor/169547 2025-12-04T08:57:44.1484461Z * [new tag] ciflow/inductor/169548 -> ciflow/inductor/169548 2025-12-04T08:57:44.1485079Z * [new tag] ciflow/inductor/169549 -> ciflow/inductor/169549 2025-12-04T08:57:44.1485872Z * [new tag] ciflow/inductor/169551 -> ciflow/inductor/169551 2025-12-04T08:57:44.1486631Z * [new tag] ciflow/inductor/169552 -> ciflow/inductor/169552 2025-12-04T08:57:44.1487399Z * [new tag] ciflow/inductor/169553 -> ciflow/inductor/169553 2025-12-04T08:57:44.1488503Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-12-04T08:57:44.1489399Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-12-04T08:57:44.1490395Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-12-04T08:57:44.1491413Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-12-04T08:57:44.1492108Z * [new tag] ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075 2025-12-04T08:57:44.1492819Z * [new tag] ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876 2025-12-04T08:57:44.1493792Z * [new tag] ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981 2025-12-04T08:57:44.1494748Z * [new tag] ciflow/mps/166254 -> ciflow/mps/166254 2025-12-04T08:57:44.1495530Z * [new tag] ciflow/mps/169017 -> ciflow/mps/169017 2025-12-04T08:57:44.1496499Z * [new tag] ciflow/mps/169372 -> ciflow/mps/169372 2025-12-04T08:57:44.1497237Z * [new tag] ciflow/mps/169478 -> ciflow/mps/169478 2025-12-04T08:57:44.1498070Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-12-04T08:57:44.1498818Z * [new tag] ciflow/op-benchmark/166075 -> ciflow/op-benchmark/166075 2025-12-04T08:57:44.1499556Z * [new tag] ciflow/op-benchmark/169544 -> ciflow/op-benchmark/169544 2025-12-04T08:57:44.1500507Z * [new tag] ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997 2025-12-04T08:57:44.1501387Z * [new tag] ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517 2025-12-04T08:57:44.1502095Z * [new tag] ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063 2025-12-04T08:57:44.1503333Z * [new tag] ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425 2025-12-04T08:57:44.1504202Z * [new tag] ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517 2025-12-04T08:57:44.1504971Z * [new tag] ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063 2025-12-04T08:57:44.1505770Z * [new tag] ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425 2025-12-04T08:57:44.1506839Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-12-04T08:57:44.1507477Z * [new tag] ciflow/periodic/167207 -> ciflow/periodic/167207 2025-12-04T08:57:44.1508786Z * [new tag] ciflow/periodic/167978 -> ciflow/periodic/167978 2025-12-04T08:57:44.1509472Z * [new tag] ciflow/periodic/168096 -> ciflow/periodic/168096 2025-12-04T08:57:44.1510211Z * [new tag] ciflow/periodic/169286 -> ciflow/periodic/169286 2025-12-04T08:57:44.1511218Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-12-04T08:57:44.1512048Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-12-04T08:57:44.1512974Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-12-04T08:57:44.1513800Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-12-04T08:57:44.1515419Z * [new tag] ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T08:57:44.1516122Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-12-04T08:57:44.1517345Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-12-04T08:57:44.1518234Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-12-04T08:57:44.1519248Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-12-04T08:57:44.1520064Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-12-04T08:57:44.1521164Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-12-04T08:57:44.1522182Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-12-04T08:57:44.1522982Z * [new tag] ciflow/pull/167207 -> ciflow/pull/167207 2025-12-04T08:57:44.1524128Z * [new tag] ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207 2025-12-04T08:57:44.1524828Z * [new tag] ciflow/rocm-mi200/165545 -> ciflow/rocm-mi200/165545 2025-12-04T08:57:44.1525540Z * [new tag] ciflow/rocm-mi200/165997 -> ciflow/rocm-mi200/165997 2025-12-04T08:57:44.1526230Z * [new tag] ciflow/rocm-mi200/168096 -> ciflow/rocm-mi200/168096 2025-12-04T08:57:44.1527178Z * [new tag] ciflow/rocm-mi200/168275 -> ciflow/rocm-mi200/168275 2025-12-04T08:57:44.1527818Z * [new tag] ciflow/rocm-mi200/169063 -> ciflow/rocm-mi200/169063 2025-12-04T08:57:44.1528859Z * [new tag] ciflow/rocm-mi200/169356 -> ciflow/rocm-mi200/169356 2025-12-04T08:57:44.1529546Z * [new tag] ciflow/rocm-mi200/169425 -> ciflow/rocm-mi200/169425 2025-12-04T08:57:44.1530343Z * [new tag] ciflow/rocm-mi300/165545 -> ciflow/rocm-mi300/165545 2025-12-04T08:57:44.1531284Z * [new tag] ciflow/rocm-mi300/167157 -> ciflow/rocm-mi300/167157 2025-12-04T08:57:44.1531881Z * [new tag] ciflow/rocm-mi300/168096 -> ciflow/rocm-mi300/168096 2025-12-04T08:57:44.1532605Z * [new tag] ciflow/rocm-mi300/169063 -> ciflow/rocm-mi300/169063 2025-12-04T08:57:44.1533346Z * [new tag] ciflow/rocm-mi300/169425 -> ciflow/rocm-mi300/169425 2025-12-04T08:57:44.1534597Z * [new tag] ciflow/rocm-mi355/167157 -> ciflow/rocm-mi355/167157 2025-12-04T08:57:44.1535254Z * [new tag] ciflow/rocm-mi355/168275 -> ciflow/rocm-mi355/168275 2025-12-04T08:57:44.1535980Z * [new tag] ciflow/rocm-mi355/169425 -> ciflow/rocm-mi355/169425 2025-12-04T08:57:44.1536971Z * [new tag] ciflow/rocm-navi31/168275 -> ciflow/rocm-navi31/168275 2025-12-04T08:57:44.1537653Z * [new tag] ciflow/rocm-navi31/169425 -> ciflow/rocm-navi31/169425 2025-12-04T08:57:44.1538478Z * [new tag] ciflow/rocm/115316 -> ciflow/rocm/115316 2025-12-04T08:57:44.1539175Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-12-04T08:57:44.1539891Z * [new tag] ciflow/rocm/160685 -> ciflow/rocm/160685 2025-12-04T08:57:44.1540622Z * [new tag] ciflow/rocm/161607 -> ciflow/rocm/161607 2025-12-04T08:57:44.1541308Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-12-04T08:57:44.1542040Z * [new tag] ciflow/rocm/165997 -> ciflow/rocm/165997 2025-12-04T08:57:44.1542761Z * [new tag] ciflow/rocm/166165 -> ciflow/rocm/166165 2025-12-04T08:57:44.1543496Z * [new tag] ciflow/rocm/166517 -> ciflow/rocm/166517 2025-12-04T08:57:44.1544174Z * [new tag] ciflow/rocm/167207 -> ciflow/rocm/167207 2025-12-04T08:57:44.1544991Z * [new tag] ciflow/rocm/167536 -> ciflow/rocm/167536 2025-12-04T08:57:44.1546227Z * [new tag] ciflow/rocm/167781 -> ciflow/rocm/167781 2025-12-04T08:57:44.1547189Z * [new tag] ciflow/rocm/167989 -> ciflow/rocm/167989 2025-12-04T08:57:44.1548138Z * [new tag] ciflow/rocm/168073 -> ciflow/rocm/168073 2025-12-04T08:57:44.1549004Z * [new tag] ciflow/rocm/168195 -> ciflow/rocm/168195 2025-12-04T08:57:44.1549730Z * [new tag] ciflow/rocm/168939 -> ciflow/rocm/168939 2025-12-04T08:57:44.1550479Z * [new tag] ciflow/rocm/168971 -> ciflow/rocm/168971 2025-12-04T08:57:44.1551368Z * [new tag] ciflow/rocm/169024 -> ciflow/rocm/169024 2025-12-04T08:57:44.1552028Z * [new tag] ciflow/rocm/169200 -> ciflow/rocm/169200 2025-12-04T08:57:44.1552756Z * [new tag] ciflow/rocm/169216 -> ciflow/rocm/169216 2025-12-04T08:57:44.1553516Z * [new tag] ciflow/rocm/169312 -> ciflow/rocm/169312 2025-12-04T08:57:44.1554215Z * [new tag] ciflow/rocm/169380 -> ciflow/rocm/169380 2025-12-04T08:57:44.1554963Z * [new tag] ciflow/rocm/169427 -> ciflow/rocm/169427 2025-12-04T08:57:44.1555692Z * [new tag] ciflow/rocm/169455 -> ciflow/rocm/169455 2025-12-04T08:57:44.1556446Z * [new tag] ciflow/rocm/169470 -> ciflow/rocm/169470 2025-12-04T08:57:44.1557180Z * [new tag] ciflow/rocm/169471 -> ciflow/rocm/169471 2025-12-04T08:57:44.1557892Z * [new tag] ciflow/rocm/169472 -> ciflow/rocm/169472 2025-12-04T08:57:44.1558648Z * [new tag] ciflow/rocm/169514 -> ciflow/rocm/169514 2025-12-04T08:57:44.1559823Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-12-04T08:57:44.1560592Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-12-04T08:57:44.1561987Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-12-04T08:57:44.1562439Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-12-04T08:57:44.1563138Z * [new tag] ciflow/slow/167207 -> ciflow/slow/167207 2025-12-04T08:57:44.1563844Z * [new tag] ciflow/slow/168050 -> ciflow/slow/168050 2025-12-04T08:57:44.1564771Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-12-04T08:57:44.1565680Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-12-04T08:57:44.1566793Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-12-04T08:57:44.1567959Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-12-04T08:57:44.1569013Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-12-04T08:57:44.1569983Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-12-04T08:57:44.1570865Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-12-04T08:57:44.1571769Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-12-04T08:57:44.1573259Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-12-04T08:57:44.1573954Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-12-04T08:57:44.1574941Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-12-04T08:57:44.1575937Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-12-04T08:57:44.1576858Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-12-04T08:57:44.1577833Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-12-04T08:57:44.1579583Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-12-04T08:57:44.1580123Z * [new tag] ciflow/torchbench/168175 -> ciflow/torchbench/168175 2025-12-04T08:57:44.1581035Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-12-04T08:57:44.1581748Z * [new tag] ciflow/trunk/157149 -> ciflow/trunk/157149 2025-12-04T08:57:44.1582586Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-12-04T08:57:44.1583326Z * [new tag] ciflow/trunk/159718 -> ciflow/trunk/159718 2025-12-04T08:57:44.1584028Z * [new tag] ciflow/trunk/160685 -> ciflow/trunk/160685 2025-12-04T08:57:44.1584790Z * [new tag] ciflow/trunk/160729 -> ciflow/trunk/160729 2025-12-04T08:57:44.1585469Z * [new tag] ciflow/trunk/162275 -> ciflow/trunk/162275 2025-12-04T08:57:44.1586181Z * [new tag] ciflow/trunk/162795 -> ciflow/trunk/162795 2025-12-04T08:57:44.1586904Z * [new tag] ciflow/trunk/163245 -> ciflow/trunk/163245 2025-12-04T08:57:44.1587623Z * [new tag] ciflow/trunk/163942 -> ciflow/trunk/163942 2025-12-04T08:57:44.1588335Z * [new tag] ciflow/trunk/165274 -> ciflow/trunk/165274 2025-12-04T08:57:44.1589554Z * [new tag] ciflow/trunk/165483 -> ciflow/trunk/165483 2025-12-04T08:57:44.1590688Z * [new tag] ciflow/trunk/165728 -> ciflow/trunk/165728 2025-12-04T08:57:44.1591631Z * [new tag] ciflow/trunk/165922 -> ciflow/trunk/165922 2025-12-04T08:57:44.1592309Z * [new tag] ciflow/trunk/166075 -> ciflow/trunk/166075 2025-12-04T08:57:44.1593043Z * [new tag] ciflow/trunk/166165 -> ciflow/trunk/166165 2025-12-04T08:57:44.1593805Z * [new tag] ciflow/trunk/166829 -> ciflow/trunk/166829 2025-12-04T08:57:44.1594761Z * [new tag] ciflow/trunk/166843 -> ciflow/trunk/166843 2025-12-04T08:57:44.1595434Z * [new tag] ciflow/trunk/166876 -> ciflow/trunk/166876 2025-12-04T08:57:44.1596152Z * [new tag] ciflow/trunk/167207 -> ciflow/trunk/167207 2025-12-04T08:57:44.1596907Z * [new tag] ciflow/trunk/167536 -> ciflow/trunk/167536 2025-12-04T08:57:44.1597681Z * [new tag] ciflow/trunk/167552 -> ciflow/trunk/167552 2025-12-04T08:57:44.1598406Z * [new tag] ciflow/trunk/167555 -> ciflow/trunk/167555 2025-12-04T08:57:44.1599122Z * [new tag] ciflow/trunk/167599 -> ciflow/trunk/167599 2025-12-04T08:57:44.1599881Z * [new tag] ciflow/trunk/167659 -> ciflow/trunk/167659 2025-12-04T08:57:44.1600849Z * [new tag] ciflow/trunk/167672 -> ciflow/trunk/167672 2025-12-04T08:57:44.1601517Z * [new tag] ciflow/trunk/167742 -> ciflow/trunk/167742 2025-12-04T08:57:44.1602261Z * [new tag] ciflow/trunk/167781 -> ciflow/trunk/167781 2025-12-04T08:57:44.1603363Z * [new tag] ciflow/trunk/167837 -> ciflow/trunk/167837 2025-12-04T08:57:44.1604104Z * [new tag] ciflow/trunk/167887 -> ciflow/trunk/167887 2025-12-04T08:57:44.1604804Z * [new tag] ciflow/trunk/167978 -> ciflow/trunk/167978 2025-12-04T08:57:44.1605520Z * [new tag] ciflow/trunk/168050 -> ciflow/trunk/168050 2025-12-04T08:57:44.1606256Z * [new tag] ciflow/trunk/168051 -> ciflow/trunk/168051 2025-12-04T08:57:44.1607214Z * [new tag] ciflow/trunk/168096 -> ciflow/trunk/168096 2025-12-04T08:57:44.1608567Z * [new tag] ciflow/trunk/168127 -> ciflow/trunk/168127 2025-12-04T08:57:44.1609250Z * [new tag] ciflow/trunk/168157 -> ciflow/trunk/168157 2025-12-04T08:57:44.1609988Z * [new tag] ciflow/trunk/168175 -> ciflow/trunk/168175 2025-12-04T08:57:44.1610714Z * [new tag] ciflow/trunk/168209 -> ciflow/trunk/168209 2025-12-04T08:57:44.1611637Z * [new tag] ciflow/trunk/168213 -> ciflow/trunk/168213 2025-12-04T08:57:44.1612442Z * [new tag] ciflow/trunk/168226 -> ciflow/trunk/168226 2025-12-04T08:57:44.1613277Z * [new tag] ciflow/trunk/168262 -> ciflow/trunk/168262 2025-12-04T08:57:44.1614395Z * [new tag] ciflow/trunk/168275 -> ciflow/trunk/168275 2025-12-04T08:57:44.1615191Z * [new tag] ciflow/trunk/168328 -> ciflow/trunk/168328 2025-12-04T08:57:44.1615963Z * [new tag] ciflow/trunk/168368 -> ciflow/trunk/168368 2025-12-04T08:57:44.1616725Z * [new tag] ciflow/trunk/168917 -> ciflow/trunk/168917 2025-12-04T08:57:44.1617479Z * [new tag] ciflow/trunk/168933 -> ciflow/trunk/168933 2025-12-04T08:57:44.1618437Z * [new tag] ciflow/trunk/168941 -> ciflow/trunk/168941 2025-12-04T08:57:44.1619140Z * [new tag] ciflow/trunk/168955 -> ciflow/trunk/168955 2025-12-04T08:57:44.1619917Z * [new tag] ciflow/trunk/168980 -> ciflow/trunk/168980 2025-12-04T08:57:44.1620900Z * [new tag] ciflow/trunk/169004 -> ciflow/trunk/169004 2025-12-04T08:57:44.1621624Z * [new tag] ciflow/trunk/169006 -> ciflow/trunk/169006 2025-12-04T08:57:44.1622387Z * [new tag] ciflow/trunk/169023 -> ciflow/trunk/169023 2025-12-04T08:57:44.1623182Z * [new tag] ciflow/trunk/169025 -> ciflow/trunk/169025 2025-12-04T08:57:44.1623962Z * [new tag] ciflow/trunk/169048 -> ciflow/trunk/169048 2025-12-04T08:57:44.1624716Z * [new tag] ciflow/trunk/169066 -> ciflow/trunk/169066 2025-12-04T08:57:44.1625608Z * [new tag] ciflow/trunk/169091 -> ciflow/trunk/169091 2025-12-04T08:57:44.1626357Z * [new tag] ciflow/trunk/169102 -> ciflow/trunk/169102 2025-12-04T08:57:44.1627097Z * [new tag] ciflow/trunk/169103 -> ciflow/trunk/169103 2025-12-04T08:57:44.1628021Z * [new tag] ciflow/trunk/169125 -> ciflow/trunk/169125 2025-12-04T08:57:44.1628910Z * [new tag] ciflow/trunk/169139 -> ciflow/trunk/169139 2025-12-04T08:57:44.1629836Z * [new tag] ciflow/trunk/169148 -> ciflow/trunk/169148 2025-12-04T08:57:44.1630520Z * [new tag] ciflow/trunk/169151 -> ciflow/trunk/169151 2025-12-04T08:57:44.1631266Z * [new tag] ciflow/trunk/169156 -> ciflow/trunk/169156 2025-12-04T08:57:44.1632167Z * [new tag] ciflow/trunk/169176 -> ciflow/trunk/169176 2025-12-04T08:57:44.1632888Z * [new tag] ciflow/trunk/169204 -> ciflow/trunk/169204 2025-12-04T08:57:44.1633645Z * [new tag] ciflow/trunk/169207 -> ciflow/trunk/169207 2025-12-04T08:57:44.1634375Z * [new tag] ciflow/trunk/169211 -> ciflow/trunk/169211 2025-12-04T08:57:44.1635304Z * [new tag] ciflow/trunk/169229 -> ciflow/trunk/169229 2025-12-04T08:57:44.1636292Z * [new tag] ciflow/trunk/169231 -> ciflow/trunk/169231 2025-12-04T08:57:44.1636977Z * [new tag] ciflow/trunk/169260 -> ciflow/trunk/169260 2025-12-04T08:57:44.1638033Z * [new tag] ciflow/trunk/169271 -> ciflow/trunk/169271 2025-12-04T08:57:44.1638791Z * [new tag] ciflow/trunk/169280 -> ciflow/trunk/169280 2025-12-04T08:57:44.1639498Z * [new tag] ciflow/trunk/169281 -> ciflow/trunk/169281 2025-12-04T08:57:44.1640226Z * [new tag] ciflow/trunk/169286 -> ciflow/trunk/169286 2025-12-04T08:57:44.1641186Z * [new tag] ciflow/trunk/169293 -> ciflow/trunk/169293 2025-12-04T08:57:44.1641866Z * [new tag] ciflow/trunk/169296 -> ciflow/trunk/169296 2025-12-04T08:57:44.1642609Z * [new tag] ciflow/trunk/169304 -> ciflow/trunk/169304 2025-12-04T08:57:44.1643376Z * [new tag] ciflow/trunk/169305 -> ciflow/trunk/169305 2025-12-04T08:57:44.1644096Z * [new tag] ciflow/trunk/169312 -> ciflow/trunk/169312 2025-12-04T08:57:44.1645210Z * [new tag] ciflow/trunk/169328 -> ciflow/trunk/169328 2025-12-04T08:57:44.1645928Z * [new tag] ciflow/trunk/169343 -> ciflow/trunk/169343 2025-12-04T08:57:44.1646649Z * [new tag] ciflow/trunk/169355 -> ciflow/trunk/169355 2025-12-04T08:57:44.1647415Z * [new tag] ciflow/trunk/169370 -> ciflow/trunk/169370 2025-12-04T08:57:44.1648363Z * [new tag] ciflow/trunk/169379 -> ciflow/trunk/169379 2025-12-04T08:57:44.1649071Z * [new tag] ciflow/trunk/169380 -> ciflow/trunk/169380 2025-12-04T08:57:44.1649820Z * [new tag] ciflow/trunk/169385 -> ciflow/trunk/169385 2025-12-04T08:57:44.1650579Z * [new tag] ciflow/trunk/169387 -> ciflow/trunk/169387 2025-12-04T08:57:44.1651518Z * [new tag] ciflow/trunk/169410 -> ciflow/trunk/169410 2025-12-04T08:57:44.1652189Z * [new tag] ciflow/trunk/169412 -> ciflow/trunk/169412 2025-12-04T08:57:44.1652953Z * [new tag] ciflow/trunk/169418 -> ciflow/trunk/169418 2025-12-04T08:57:44.1654017Z * [new tag] ciflow/trunk/169423 -> ciflow/trunk/169423 2025-12-04T08:57:44.1654785Z * [new tag] ciflow/trunk/169427 -> ciflow/trunk/169427 2025-12-04T08:57:44.1655734Z * [new tag] ciflow/trunk/169430 -> ciflow/trunk/169430 2025-12-04T08:57:44.1656499Z * [new tag] ciflow/trunk/169437 -> ciflow/trunk/169437 2025-12-04T08:57:44.1657268Z * [new tag] ciflow/trunk/169442 -> ciflow/trunk/169442 2025-12-04T08:57:44.1658016Z * [new tag] ciflow/trunk/169452 -> ciflow/trunk/169452 2025-12-04T08:57:44.1658791Z * [new tag] ciflow/trunk/169454 -> ciflow/trunk/169454 2025-12-04T08:57:44.1659538Z * [new tag] ciflow/trunk/169459 -> ciflow/trunk/169459 2025-12-04T08:57:44.1660931Z * [new tag] ciflow/trunk/169474 -> ciflow/trunk/169474 2025-12-04T08:57:44.1661659Z * [new tag] ciflow/trunk/169475 -> ciflow/trunk/169475 2025-12-04T08:57:44.1662414Z * [new tag] ciflow/trunk/169476 -> ciflow/trunk/169476 2025-12-04T08:57:44.1663381Z * [new tag] ciflow/trunk/169487 -> ciflow/trunk/169487 2025-12-04T08:57:44.1664092Z * [new tag] ciflow/trunk/169497 -> ciflow/trunk/169497 2025-12-04T08:57:44.1665089Z * [new tag] ciflow/trunk/169503 -> ciflow/trunk/169503 2025-12-04T08:57:44.1665932Z * [new tag] ciflow/trunk/169505 -> ciflow/trunk/169505 2025-12-04T08:57:44.1666684Z * [new tag] ciflow/trunk/169507 -> ciflow/trunk/169507 2025-12-04T08:57:44.1667439Z * [new tag] ciflow/trunk/169514 -> ciflow/trunk/169514 2025-12-04T08:57:44.1668156Z * [new tag] ciflow/trunk/169517 -> ciflow/trunk/169517 2025-12-04T08:57:44.1668963Z * [new tag] ciflow/trunk/169519 -> ciflow/trunk/169519 2025-12-04T08:57:44.1669625Z * [new tag] ciflow/trunk/169528 -> ciflow/trunk/169528 2025-12-04T08:57:44.1670343Z * [new tag] ciflow/trunk/169541 -> ciflow/trunk/169541 2025-12-04T08:57:44.1671293Z * [new tag] ciflow/trunk/169555 -> ciflow/trunk/169555 2025-12-04T08:57:44.1672412Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-12-04T08:57:44.1673178Z * [new tag] ciflow/vllm/165270 -> ciflow/vllm/165270 2025-12-04T08:57:44.1673912Z * [new tag] ciflow/vllm/165274 -> ciflow/vllm/165274 2025-12-04T08:57:44.1674623Z * [new tag] ciflow/vllm/166494 -> ciflow/vllm/166494 2025-12-04T08:57:44.1675318Z * [new tag] ciflow/vllm/169219 -> ciflow/vllm/169219 2025-12-04T08:57:44.1675991Z * [new tag] ciflow/vllm/169220 -> ciflow/vllm/169220 2025-12-04T08:57:44.1676905Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-12-04T08:57:44.1677542Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-12-04T08:57:44.1678265Z * [new tag] ciflow/xpu/161940 -> ciflow/xpu/161940 2025-12-04T08:57:44.1679587Z * [new tag] ciflow/xpu/163251 -> ciflow/xpu/163251 2025-12-04T08:57:44.1680284Z * [new tag] ciflow/xpu/166829 -> ciflow/xpu/166829 2025-12-04T08:57:44.1680958Z * [new tag] ciflow/xpu/166843 -> ciflow/xpu/166843 2025-12-04T08:57:44.1681693Z * [new tag] ciflow/xpu/167972 -> ciflow/xpu/167972 2025-12-04T08:57:44.1682395Z * [new tag] ciflow/xpu/167981 -> ciflow/xpu/167981 2025-12-04T08:57:44.1683091Z * [new tag] ciflow/xpu/168213 -> ciflow/xpu/168213 2025-12-04T08:57:44.1683816Z * [new tag] ciflow/xpu/168262 -> ciflow/xpu/168262 2025-12-04T08:57:44.1684528Z * [new tag] ciflow/xpu/168328 -> ciflow/xpu/168328 2025-12-04T08:57:44.1685561Z * [new tag] ciflow/xpu/168950 -> ciflow/xpu/168950 2025-12-04T08:57:44.1686731Z * [new tag] ciflow/xpu/169039 -> ciflow/xpu/169039 2025-12-04T08:57:44.1687657Z * [new tag] ciflow/xpu/169200 -> ciflow/xpu/169200 2025-12-04T08:57:44.1688414Z * [new tag] ciflow/xpu/169203 -> ciflow/xpu/169203 2025-12-04T08:57:44.1689160Z * [new tag] ciflow/xpu/169229 -> ciflow/xpu/169229 2025-12-04T08:57:44.1689929Z * [new tag] ciflow/xpu/169230 -> ciflow/xpu/169230 2025-12-04T08:57:44.1690689Z * [new tag] ciflow/xpu/169231 -> ciflow/xpu/169231 2025-12-04T08:57:44.1691731Z * [new tag] ciflow/xpu/169241 -> ciflow/xpu/169241 2025-12-04T08:57:44.1692417Z * [new tag] ciflow/xpu/169280 -> ciflow/xpu/169280 2025-12-04T08:57:44.1693231Z * [new tag] ciflow/xpu/169296 -> ciflow/xpu/169296 2025-12-04T08:57:44.1694536Z * [new tag] ciflow/xpu/169353 -> ciflow/xpu/169353 2025-12-04T08:57:44.1695128Z * [new tag] ciflow/xpu/169410 -> ciflow/xpu/169410 2025-12-04T08:57:44.1695904Z * [new tag] ciflow/xpu/169442 -> ciflow/xpu/169442 2025-12-04T08:57:44.1696707Z * [new tag] ciflow/xpu/169555 -> ciflow/xpu/169555 2025-12-04T08:57:44.1697669Z * [new tag] cslpull75 -> cslpull75 2025-12-04T08:57:44.1698460Z * [new tag] cslpull76 -> cslpull76 2025-12-04T08:57:44.1699244Z * [new tag] cslpull77 -> cslpull77 2025-12-04T08:57:44.1700127Z * [new tag] cslpull78 -> cslpull78 2025-12-04T08:57:44.1701319Z * [new tag] cslpull79 -> cslpull79 2025-12-04T08:57:44.1702348Z * [new tag] cslpull80 -> cslpull80 2025-12-04T08:57:44.1703242Z * [new tag] cslpull81 -> cslpull81 2025-12-04T08:57:44.1704030Z * [new tag] cslpull82 -> cslpull82 2025-12-04T08:57:44.1705042Z * [new tag] cslpull83 -> cslpull83 2025-12-04T08:57:44.1706014Z * [new tag] cslpull84 -> cslpull84 2025-12-04T08:57:44.1706892Z * [new tag] cslpull85 -> cslpull85 2025-12-04T08:57:44.1707763Z * [new tag] cslpull86 -> cslpull86 2025-12-04T08:57:44.1708659Z * [new tag] cslpull87 -> cslpull87 2025-12-04T08:57:44.1709567Z * [new tag] cslpull88 -> cslpull88 2025-12-04T08:57:44.1710307Z * [new tag] cslpull89 -> cslpull89 2025-12-04T08:57:44.1711004Z * [new tag] cslpull90 -> cslpull90 2025-12-04T08:57:44.1712295Z * [new tag] cslpull91 -> cslpull91 2025-12-04T08:57:44.1713584Z * [new tag] cslpull92 -> cslpull92 2025-12-04T08:57:44.1714497Z * [new tag] flight_5 -> flight_5 2025-12-04T08:57:44.1715521Z * [new tag] flight_5.1 -> flight_5.1 2025-12-04T08:57:44.1716438Z * [new tag] flight_5.2 -> flight_5.2 2025-12-04T08:57:44.1717293Z * [new tag] flight_5.3 -> flight_5.3 2025-12-04T08:57:44.1718088Z * [new tag] forpull1 -> forpull1 2025-12-04T08:57:44.1719174Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-12-04T08:57:44.1719980Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-12-04T08:57:44.1720912Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-12-04T08:57:44.1721716Z * [new tag] nightly-binary -> nightly-binary 2025-12-04T08:57:44.1722661Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-12-04T08:57:44.1723709Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-12-04T08:57:44.1725042Z * [new tag] trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 2025-12-04T08:57:44.1725781Z * [new tag] trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e 2025-12-04T08:57:44.1726976Z * [new tag] trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 2025-12-04T08:57:44.1727983Z * [new tag] trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654 2025-12-04T08:57:44.1728912Z * [new tag] trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb 2025-12-04T08:57:44.1729796Z * [new tag] trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b 2025-12-04T08:57:44.1730674Z * [new tag] trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 2025-12-04T08:57:44.1731512Z * [new tag] trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75 2025-12-04T08:57:44.1732642Z * [new tag] trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 2025-12-04T08:57:44.1733768Z * [new tag] trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 2025-12-04T08:57:44.1734850Z * [new tag] trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688 2025-12-04T08:57:44.1735824Z * [new tag] trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10 2025-12-04T08:57:44.1736753Z * [new tag] trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec 2025-12-04T08:57:44.1737678Z * [new tag] trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f 2025-12-04T08:57:44.1738865Z * [new tag] trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 2025-12-04T08:57:44.1739728Z * [new tag] trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 2025-12-04T08:57:44.1740662Z * [new tag] trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 2025-12-04T08:57:44.1741563Z * [new tag] trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 2025-12-04T08:57:44.1742467Z * [new tag] trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e 2025-12-04T08:57:44.1743349Z * [new tag] trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520 2025-12-04T08:57:44.1744268Z * [new tag] trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 2025-12-04T08:57:44.1745260Z * [new tag] trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c 2025-12-04T08:57:44.1746105Z * [new tag] trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d 2025-12-04T08:57:44.1747031Z * [new tag] trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 2025-12-04T08:57:44.1747974Z * [new tag] trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de 2025-12-04T08:57:44.1748878Z * [new tag] trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 2025-12-04T08:57:44.1749792Z * [new tag] trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 2025-12-04T08:57:44.1750675Z * [new tag] trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f 2025-12-04T08:57:44.1751853Z * [new tag] trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9 2025-12-04T08:57:44.1752570Z * [new tag] trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87 2025-12-04T08:57:44.1753440Z * [new tag] trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc 2025-12-04T08:57:44.1754447Z * [new tag] trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25 2025-12-04T08:57:44.1755396Z * [new tag] trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 2025-12-04T08:57:44.1756323Z * [new tag] trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf 2025-12-04T08:57:44.1757088Z * [new tag] trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd 2025-12-04T08:57:44.1757936Z * [new tag] trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 2025-12-04T08:57:44.1758877Z * [new tag] trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 2025-12-04T08:57:44.1759789Z * [new tag] trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35 2025-12-04T08:57:44.1760737Z * [new tag] trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec 2025-12-04T08:57:44.1761745Z * [new tag] trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 2025-12-04T08:57:44.1762592Z * [new tag] trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1 2025-12-04T08:57:44.1763465Z * [new tag] trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 2025-12-04T08:57:44.1764341Z * [new tag] trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 2025-12-04T08:57:44.1765375Z * [new tag] trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 2025-12-04T08:57:44.1766213Z * [new tag] trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf 2025-12-04T08:57:44.1767123Z * [new tag] trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee 2025-12-04T08:57:44.1768035Z * [new tag] trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 2025-12-04T08:57:44.1768753Z * [new tag] trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 2025-12-04T08:57:44.1769640Z * [new tag] trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae 2025-12-04T08:57:44.1770508Z * [new tag] trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f 2025-12-04T08:57:44.1771374Z * [new tag] trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c 2025-12-04T08:57:44.1772283Z * [new tag] trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247 2025-12-04T08:57:44.1773227Z * [new tag] trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c 2025-12-04T08:57:44.1774440Z * [new tag] trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a 2025-12-04T08:57:44.1775749Z * [new tag] trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686 2025-12-04T08:57:44.1776944Z * [new tag] trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 2025-12-04T08:57:44.1777649Z * [new tag] trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af 2025-12-04T08:57:44.1778763Z * [new tag] trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 2025-12-04T08:57:44.1779790Z * [new tag] trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873 2025-12-04T08:57:44.1780720Z * [new tag] trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 2025-12-04T08:57:44.1781483Z * [new tag] trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f 2025-12-04T08:57:44.1782698Z * [new tag] trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa 2025-12-04T08:57:44.1783529Z * [new tag] trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c 2025-12-04T08:57:44.1785211Z * [new tag] trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a 2025-12-04T08:57:44.1785945Z * [new tag] trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d 2025-12-04T08:57:44.1786861Z * [new tag] trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 2025-12-04T08:57:44.1787791Z * [new tag] trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 2025-12-04T08:57:44.1788785Z * [new tag] trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a 2025-12-04T08:57:44.1789778Z * [new tag] trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 2025-12-04T08:57:44.1790764Z * [new tag] trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 2025-12-04T08:57:44.1791728Z * [new tag] trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 2025-12-04T08:57:44.1792704Z * [new tag] trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96 2025-12-04T08:57:44.1793457Z * [new tag] trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc 2025-12-04T08:57:44.1794363Z * [new tag] trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 2025-12-04T08:57:44.1795233Z * [new tag] trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd 2025-12-04T08:57:44.1796115Z * [new tag] trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 2025-12-04T08:57:44.1797289Z * [new tag] trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 2025-12-04T08:57:44.1798105Z * [new tag] trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c 2025-12-04T08:57:44.1799015Z * [new tag] trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b 2025-12-04T08:57:44.1799773Z * [new tag] trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c 2025-12-04T08:57:44.1800589Z * [new tag] trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 2025-12-04T08:57:44.1801545Z * [new tag] trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 2025-12-04T08:57:44.1802534Z * [new tag] trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef 2025-12-04T08:57:44.1803424Z * [new tag] trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719 2025-12-04T08:57:44.1804336Z * [new tag] trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b 2025-12-04T08:57:44.1805215Z * [new tag] trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a 2025-12-04T08:57:44.1806138Z * [new tag] trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206 2025-12-04T08:57:44.1807038Z * [new tag] trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 2025-12-04T08:57:44.1807916Z * [new tag] trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 2025-12-04T08:57:44.1808894Z * [new tag] trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d 2025-12-04T08:57:44.1809725Z * [new tag] trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b 2025-12-04T08:57:44.1810587Z * [new tag] trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 2025-12-04T08:57:44.1811481Z * [new tag] trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 2025-12-04T08:57:44.1812412Z * [new tag] trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec 2025-12-04T08:57:44.1813539Z * [new tag] trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 2025-12-04T08:57:44.1814619Z * [new tag] trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d 2025-12-04T08:57:44.1815551Z * [new tag] trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a 2025-12-04T08:57:44.1816530Z * [new tag] trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e 2025-12-04T08:57:44.1817443Z * [new tag] trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 2025-12-04T08:57:44.1818328Z * [new tag] trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485 2025-12-04T08:57:44.1819279Z * [new tag] trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734 2025-12-04T08:57:44.1820189Z * [new tag] trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f 2025-12-04T08:57:44.1821187Z * [new tag] trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 2025-12-04T08:57:44.1822118Z * [new tag] trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03 2025-12-04T08:57:44.1823090Z * [new tag] trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 2025-12-04T08:57:44.1824077Z * [new tag] trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 2025-12-04T08:57:44.1824847Z * [new tag] trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 2025-12-04T08:57:44.1825923Z * [new tag] trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca 2025-12-04T08:57:44.1826798Z * [new tag] trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b 2025-12-04T08:57:44.1827681Z * [new tag] trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609 2025-12-04T08:57:44.1828660Z * [new tag] trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b 2025-12-04T08:57:44.1829399Z * [new tag] trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T08:57:44.1830180Z * [new tag] trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 2025-12-04T08:57:44.1831111Z * [new tag] trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed 2025-12-04T08:57:44.1832000Z * [new tag] trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 2025-12-04T08:57:44.1832994Z * [new tag] trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e 2025-12-04T08:57:44.1833684Z * [new tag] trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead 2025-12-04T08:57:44.1834593Z * [new tag] trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718 2025-12-04T08:57:44.1836224Z * [new tag] trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0 2025-12-04T08:57:44.1837401Z * [new tag] trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75 2025-12-04T08:57:44.1838248Z * [new tag] trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece 2025-12-04T08:57:44.1839027Z * [new tag] trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6 2025-12-04T08:57:44.1840181Z * [new tag] trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 2025-12-04T08:57:44.1840990Z * [new tag] trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c 2025-12-04T08:57:44.1841938Z * [new tag] trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 2025-12-04T08:57:44.1842919Z * [new tag] trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 2025-12-04T08:57:44.1843880Z * [new tag] trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca 2025-12-04T08:57:44.1844788Z * [new tag] trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38 2025-12-04T08:57:44.1845541Z * [new tag] trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 2025-12-04T08:57:44.1846398Z * [new tag] trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c 2025-12-04T08:57:44.1848077Z * [new tag] trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 2025-12-04T08:57:44.1848525Z * [new tag] trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 2025-12-04T08:57:44.1849893Z * [new tag] trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa 2025-12-04T08:57:44.1850609Z * [new tag] trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d 2025-12-04T08:57:44.1851135Z * [new tag] trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 2025-12-04T08:57:44.1851942Z * [new tag] trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 2025-12-04T08:57:44.1852819Z * [new tag] trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d 2025-12-04T08:57:44.1854108Z * [new tag] trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a 2025-12-04T08:57:44.1854998Z * [new tag] trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 2025-12-04T08:57:44.1856233Z * [new tag] trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 2025-12-04T08:57:44.1857103Z * [new tag] trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa 2025-12-04T08:57:44.1857983Z * [new tag] trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d 2025-12-04T08:57:44.1859304Z * [new tag] trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c 2025-12-04T08:57:44.1860009Z * [new tag] trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 2025-12-04T08:57:44.1860937Z * [new tag] trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c 2025-12-04T08:57:44.1861726Z * [new tag] trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45 2025-12-04T08:57:44.1862707Z * [new tag] trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 2025-12-04T08:57:44.1863718Z * [new tag] trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e 2025-12-04T08:57:44.1864607Z * [new tag] trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e 2025-12-04T08:57:44.1865555Z * [new tag] trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e 2025-12-04T08:57:44.1866304Z * [new tag] trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 2025-12-04T08:57:44.1867254Z * [new tag] trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 2025-12-04T08:57:44.1868212Z * [new tag] trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 2025-12-04T08:57:44.1869135Z * [new tag] trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99 2025-12-04T08:57:44.1870061Z * [new tag] trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 2025-12-04T08:57:44.1871017Z * [new tag] trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 2025-12-04T08:57:44.1872002Z * [new tag] trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a 2025-12-04T08:57:44.1872923Z * [new tag] trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 2025-12-04T08:57:44.1873856Z * [new tag] trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 2025-12-04T08:57:44.1874798Z * [new tag] trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b 2025-12-04T08:57:44.1875758Z * [new tag] trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e 2025-12-04T08:57:44.1876674Z * [new tag] trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf 2025-12-04T08:57:44.1877555Z * [new tag] trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 2025-12-04T08:57:44.1878563Z * [new tag] trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f 2025-12-04T08:57:44.1882880Z * [new tag] trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f 2025-12-04T08:57:44.1883866Z * [new tag] trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2 2025-12-04T08:57:44.1884822Z * [new tag] trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 2025-12-04T08:57:44.1886015Z * [new tag] trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 2025-12-04T08:57:44.1887078Z * [new tag] trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 2025-12-04T08:57:44.1887898Z * [new tag] trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 2025-12-04T08:57:44.1889207Z * [new tag] trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 2025-12-04T08:57:44.1890221Z * [new tag] trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede 2025-12-04T08:57:44.1891295Z * [new tag] trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac 2025-12-04T08:57:44.1892206Z * [new tag] trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb 2025-12-04T08:57:44.1893066Z * [new tag] trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 2025-12-04T08:57:44.1894557Z * [new tag] trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 2025-12-04T08:57:44.1895502Z * [new tag] trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09 2025-12-04T08:57:44.1896398Z * [new tag] trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 2025-12-04T08:57:44.1897416Z * [new tag] trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a 2025-12-04T08:57:44.1898390Z * [new tag] trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace 2025-12-04T08:57:44.1899267Z * [new tag] trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39 2025-12-04T08:57:44.1900196Z * [new tag] trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 2025-12-04T08:57:44.1901599Z * [new tag] trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 2025-12-04T08:57:44.1902632Z * [new tag] trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf 2025-12-04T08:57:44.1903565Z * [new tag] trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 2025-12-04T08:57:44.1904432Z * [new tag] trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d 2025-12-04T08:57:44.1905521Z * [new tag] trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 2025-12-04T08:57:44.1906407Z * [new tag] trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 2025-12-04T08:57:44.1907317Z * [new tag] trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e 2025-12-04T08:57:44.1908288Z * [new tag] trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a 2025-12-04T08:57:44.1909212Z * [new tag] trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b 2025-12-04T08:57:44.1910126Z * [new tag] trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec 2025-12-04T08:57:44.1911028Z * [new tag] trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf 2025-12-04T08:57:44.1911910Z * [new tag] trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd 2025-12-04T08:57:44.1912859Z * [new tag] trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889 2025-12-04T08:57:44.1913846Z * [new tag] trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77 2025-12-04T08:57:44.1915107Z * [new tag] trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c 2025-12-04T08:57:44.1916036Z * [new tag] trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b 2025-12-04T08:57:44.1916845Z * [new tag] trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235 2025-12-04T08:57:44.1917808Z * [new tag] trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e 2025-12-04T08:57:44.1918838Z * [new tag] trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc 2025-12-04T08:57:44.1919675Z * [new tag] trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 2025-12-04T08:57:44.1920628Z * [new tag] trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf 2025-12-04T08:57:44.1921648Z * [new tag] trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e 2025-12-04T08:57:44.1922566Z * [new tag] trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e 2025-12-04T08:57:44.1924045Z * [new tag] trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 2025-12-04T08:57:44.1924893Z * [new tag] trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 2025-12-04T08:57:44.1925815Z * [new tag] trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 2025-12-04T08:57:44.1926700Z * [new tag] trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598 2025-12-04T08:57:44.1927641Z * [new tag] trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 2025-12-04T08:57:44.1928526Z * [new tag] trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 2025-12-04T08:57:44.1929468Z * [new tag] trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 2025-12-04T08:57:44.1930330Z * [new tag] trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 2025-12-04T08:57:44.1931271Z * [new tag] trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 2025-12-04T08:57:44.1932191Z * [new tag] trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 2025-12-04T08:57:44.1932972Z * [new tag] trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 2025-12-04T08:57:44.1934400Z * [new tag] trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b 2025-12-04T08:57:44.1935300Z * [new tag] trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 2025-12-04T08:57:44.1936833Z * [new tag] trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 2025-12-04T08:57:44.1937762Z * [new tag] trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 2025-12-04T08:57:44.1938741Z * [new tag] trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:44.1939443Z * [new tag] v0.1.1 -> v0.1.1 2025-12-04T08:57:44.1940386Z * [new tag] v0.1.10 -> v0.1.10 2025-12-04T08:57:44.1941130Z * [new tag] v0.1.11 -> v0.1.11 2025-12-04T08:57:44.1942040Z * [new tag] v0.1.12 -> v0.1.12 2025-12-04T08:57:44.1942947Z * [new tag] v0.1.2 -> v0.1.2 2025-12-04T08:57:44.1943864Z * [new tag] v0.1.3 -> v0.1.3 2025-12-04T08:57:44.1944605Z * [new tag] v0.1.4 -> v0.1.4 2025-12-04T08:57:44.1945608Z * [new tag] v0.1.5 -> v0.1.5 2025-12-04T08:57:44.1946460Z * [new tag] v0.1.6 -> v0.1.6 2025-12-04T08:57:44.1947403Z * [new tag] v0.1.7 -> v0.1.7 2025-12-04T08:57:44.1948147Z * [new tag] v0.1.8 -> v0.1.8 2025-12-04T08:57:44.1948982Z * [new tag] v0.1.9 -> v0.1.9 2025-12-04T08:57:44.1949877Z * [new tag] v0.2.0 -> v0.2.0 2025-12-04T08:57:44.1950753Z * [new tag] v0.3.0 -> v0.3.0 2025-12-04T08:57:44.1951837Z * [new tag] v0.3.1 -> v0.3.1 2025-12-04T08:57:44.1952666Z * [new tag] v0.4.0 -> v0.4.0 2025-12-04T08:57:44.1953411Z * [new tag] v0.4.1 -> v0.4.1 2025-12-04T08:57:44.1954299Z * [new tag] v1.0.0 -> v1.0.0 2025-12-04T08:57:44.1955173Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-12-04T08:57:44.1956026Z * [new tag] v1.0.1 -> v1.0.1 2025-12-04T08:57:44.1956906Z * [new tag] v1.0rc0 -> v1.0rc0 2025-12-04T08:57:44.1957492Z * [new tag] v1.0rc1 -> v1.0rc1 2025-12-04T08:57:44.1958425Z * [new tag] v1.1.0 -> v1.1.0 2025-12-04T08:57:44.1959288Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-12-04T08:57:44.1960367Z * [new tag] v1.10.0 -> v1.10.0 2025-12-04T08:57:44.1961288Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-12-04T08:57:44.1962192Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-12-04T08:57:44.1962779Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-12-04T08:57:44.1963729Z * [new tag] v1.10.1 -> v1.10.1 2025-12-04T08:57:44.1964335Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-12-04T08:57:44.1965054Z * [new tag] v1.10.2 -> v1.10.2 2025-12-04T08:57:44.1965729Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-12-04T08:57:44.1966638Z * [new tag] v1.11.0 -> v1.11.0 2025-12-04T08:57:44.1967601Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-12-04T08:57:44.1968665Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-12-04T08:57:44.1969584Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-12-04T08:57:44.1970529Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-12-04T08:57:44.1971439Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-12-04T08:57:44.1972156Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-12-04T08:57:44.1972753Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-12-04T08:57:44.1974107Z * [new tag] v1.12.0 -> v1.12.0 2025-12-04T08:57:44.1974863Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-12-04T08:57:44.1975918Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-12-04T08:57:44.1976801Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-12-04T08:57:44.1977776Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-12-04T08:57:44.1978581Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-12-04T08:57:44.1979979Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-12-04T08:57:44.1980665Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-12-04T08:57:44.1981382Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-12-04T08:57:44.1982075Z * [new tag] v1.12.1 -> v1.12.1 2025-12-04T08:57:44.1983169Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-12-04T08:57:44.1984070Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-12-04T08:57:44.1985094Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-12-04T08:57:44.1986006Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-12-04T08:57:44.1986737Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-12-04T08:57:44.1988199Z * [new tag] v1.13.0 -> v1.13.0 2025-12-04T08:57:44.1989137Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-12-04T08:57:44.1990036Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-12-04T08:57:44.1991015Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-12-04T08:57:44.1992043Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-12-04T08:57:44.1992755Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-12-04T08:57:44.1993368Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-12-04T08:57:44.1994362Z * [new tag] v1.13.1 -> v1.13.1 2025-12-04T08:57:44.1995031Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-12-04T08:57:44.1995904Z * [new tag] v1.2.0 -> v1.2.0 2025-12-04T08:57:44.1996779Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-12-04T08:57:44.1997635Z * [new tag] v1.3.0 -> v1.3.0 2025-12-04T08:57:44.1998396Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-12-04T08:57:44.1999077Z * [new tag] v1.3.1 -> v1.3.1 2025-12-04T08:57:44.1999984Z * [new tag] v1.4.0 -> v1.4.0 2025-12-04T08:57:44.2000871Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-12-04T08:57:44.2001421Z * [new tag] v1.4.1 -> v1.4.1 2025-12-04T08:57:44.2002458Z * [new tag] v1.5.0 -> v1.5.0 2025-12-04T08:57:44.2003398Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-12-04T08:57:44.2004303Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-12-04T08:57:44.2005271Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-12-04T08:57:44.2006051Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-12-04T08:57:44.2006731Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-12-04T08:57:44.2007795Z * [new tag] v1.5.1 -> v1.5.1 2025-12-04T08:57:44.2008454Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-12-04T08:57:44.2009145Z * [new tag] v1.6.0 -> v1.6.0 2025-12-04T08:57:44.2010123Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-12-04T08:57:44.2011093Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-12-04T08:57:44.2011991Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-12-04T08:57:44.2012967Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-12-04T08:57:44.2014076Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-12-04T08:57:44.2015025Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-12-04T08:57:44.2015645Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-12-04T08:57:44.2016636Z * [new tag] v1.7.0 -> v1.7.0 2025-12-04T08:57:44.2017593Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-12-04T08:57:44.2018663Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-12-04T08:57:44.2019595Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-12-04T08:57:44.2020271Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-12-04T08:57:44.2021234Z * [new tag] v1.7.1 -> v1.7.1 2025-12-04T08:57:44.2022332Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-12-04T08:57:44.2023289Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-12-04T08:57:44.2023980Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-12-04T08:57:44.2024944Z * [new tag] v1.8.0 -> v1.8.0 2025-12-04T08:57:44.2025741Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-12-04T08:57:44.2026684Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-12-04T08:57:44.2027582Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-12-04T08:57:44.2028468Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-12-04T08:57:44.2029035Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-12-04T08:57:44.2029742Z * [new tag] v1.8.1 -> v1.8.1 2025-12-04T08:57:44.2030684Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-12-04T08:57:44.2031327Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-12-04T08:57:44.2032083Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-12-04T08:57:44.2033538Z * [new tag] v1.8.2 -> v1.8.2 2025-12-04T08:57:44.2034224Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-12-04T08:57:44.2035133Z * [new tag] v1.9.0 -> v1.9.0 2025-12-04T08:57:44.2036239Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-12-04T08:57:44.2037236Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-12-04T08:57:44.2038178Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-12-04T08:57:44.2038836Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-12-04T08:57:44.2039761Z * [new tag] v1.9.1 -> v1.9.1 2025-12-04T08:57:44.2040891Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-12-04T08:57:44.2041533Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-12-04T08:57:44.2042508Z * [new tag] v2.0.0 -> v2.0.0 2025-12-04T08:57:44.2043378Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-12-04T08:57:44.2044678Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-12-04T08:57:44.2045644Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-12-04T08:57:44.2046503Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-12-04T08:57:44.2047429Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-12-04T08:57:44.2048132Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-12-04T08:57:44.2049071Z * [new tag] v2.0.1 -> v2.0.1 2025-12-04T08:57:44.2050104Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-12-04T08:57:44.2050658Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-12-04T08:57:44.2051508Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-12-04T08:57:44.2052184Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-12-04T08:57:44.2053990Z * [new tag] v2.1.0 -> v2.1.0 2025-12-04T08:57:44.2054815Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-12-04T08:57:44.2055803Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-12-04T08:57:44.2056837Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-12-04T08:57:44.2057816Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-12-04T08:57:44.2058792Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-12-04T08:57:44.2059480Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-12-04T08:57:44.2060412Z * [new tag] v2.1.1 -> v2.1.1 2025-12-04T08:57:44.2061493Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-12-04T08:57:44.2062467Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-12-04T08:57:44.2063496Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-12-04T08:57:44.2064469Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-12-04T08:57:44.2065461Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-12-04T08:57:44.2066212Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-12-04T08:57:44.2067105Z * [new tag] v2.1.2 -> v2.1.2 2025-12-04T08:57:44.2068075Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-12-04T08:57:44.2069035Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-12-04T08:57:44.2069682Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-12-04T08:57:44.2070660Z * [new tag] v2.2.0 -> v2.2.0 2025-12-04T08:57:44.2071533Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-12-04T08:57:44.2072438Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-12-04T08:57:44.2073208Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-12-04T08:57:44.2074133Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-12-04T08:57:44.2075050Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-12-04T08:57:44.2075908Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-12-04T08:57:44.2076560Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-12-04T08:57:44.2077234Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-12-04T08:57:44.2078229Z * [new tag] v2.2.1 -> v2.2.1 2025-12-04T08:57:44.2079633Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-12-04T08:57:44.2080287Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-12-04T08:57:44.2081047Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-12-04T08:57:44.2081692Z * [new tag] v2.2.2 -> v2.2.2 2025-12-04T08:57:44.2082872Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-12-04T08:57:44.2083526Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-12-04T08:57:44.2084259Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-12-04T08:57:44.2085249Z * [new tag] v2.3.0 -> v2.3.0 2025-12-04T08:57:44.2086195Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-12-04T08:57:44.2087344Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-12-04T08:57:44.2088188Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-12-04T08:57:44.2088888Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-12-04T08:57:44.2089838Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-12-04T08:57:44.2090832Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-12-04T08:57:44.2091882Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-12-04T08:57:44.2092742Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-12-04T08:57:44.2093704Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-12-04T08:57:44.2094865Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-12-04T08:57:44.2095860Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-12-04T08:57:44.2096533Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-12-04T08:57:44.2097227Z * [new tag] v2.3.1 -> v2.3.1 2025-12-04T08:57:44.2098283Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-12-04T08:57:44.2099217Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-12-04T08:57:44.2100199Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-12-04T08:57:44.2101129Z * [new tag] v2.4.0 -> v2.4.0 2025-12-04T08:57:44.2102053Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-12-04T08:57:44.2103456Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-12-04T08:57:44.2104321Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-12-04T08:57:44.2105269Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-12-04T08:57:44.2106367Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-12-04T08:57:44.2107297Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-12-04T08:57:44.2108241Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-12-04T08:57:44.2109137Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-12-04T08:57:44.2110070Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-12-04T08:57:44.2110719Z * [new tag] v2.4.1 -> v2.4.1 2025-12-04T08:57:44.2111789Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-12-04T08:57:44.2112687Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-12-04T08:57:44.2113666Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-12-04T08:57:44.2114570Z * [new tag] v2.5.0 -> v2.5.0 2025-12-04T08:57:44.2115435Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-12-04T08:57:44.2116115Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-12-04T08:57:44.2117065Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-12-04T08:57:44.2117967Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-12-04T08:57:44.2118859Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-12-04T08:57:44.2119814Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-12-04T08:57:44.2120793Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-12-04T08:57:44.2121742Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-12-04T08:57:44.2122672Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-12-04T08:57:44.2123591Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-12-04T08:57:44.2124335Z * [new tag] v2.5.1 -> v2.5.1 2025-12-04T08:57:44.2125138Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-12-04T08:57:44.2125740Z * [new tag] v2.6.0 -> v2.6.0 2025-12-04T08:57:44.2126774Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-12-04T08:57:44.2127759Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-12-04T08:57:44.2128857Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-12-04T08:57:44.2129635Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-12-04T08:57:44.2130871Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-12-04T08:57:44.2131938Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-12-04T08:57:44.2132917Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-12-04T08:57:44.2134364Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-12-04T08:57:44.2135324Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-12-04T08:57:44.2136523Z * [new tag] v2.7.0 -> v2.7.0 2025-12-04T08:57:44.2137503Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-12-04T08:57:44.2138167Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-12-04T08:57:44.2139287Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-12-04T08:57:44.2140308Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-12-04T08:57:44.2141292Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-12-04T08:57:44.2142196Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-12-04T08:57:44.2143094Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-12-04T08:57:44.2144061Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-12-04T08:57:44.2145112Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-12-04T08:57:44.2146191Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-12-04T08:57:44.2146836Z * [new tag] v2.7.1 -> v2.7.1 2025-12-04T08:57:44.2147903Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-12-04T08:57:44.2148889Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-12-04T08:57:44.2149868Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-12-04T08:57:44.2150848Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-12-04T08:57:44.2151745Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-12-04T08:57:44.2152430Z * [new tag] v2.8.0 -> v2.8.0 2025-12-04T08:57:44.2153398Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-12-04T08:57:44.2154342Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-12-04T08:57:44.2155467Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-12-04T08:57:44.2156541Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-12-04T08:57:44.2157521Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-12-04T08:57:44.2158510Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-12-04T08:57:44.2159425Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-12-04T08:57:44.2160356Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-12-04T08:57:44.2161311Z * [new tag] v2.9.0 -> v2.9.0 2025-12-04T08:57:44.2162317Z * [new tag] v2.9.0-rc1 -> v2.9.0-rc1 2025-12-04T08:57:44.2163236Z * [new tag] v2.9.0-rc10 -> v2.9.0-rc10 2025-12-04T08:57:44.2164647Z * [new tag] v2.9.0-rc11 -> v2.9.0-rc11 2025-12-04T08:57:44.2165948Z * [new tag] v2.9.0-rc2 -> v2.9.0-rc2 2025-12-04T08:57:44.2166851Z * [new tag] v2.9.0-rc3 -> v2.9.0-rc3 2025-12-04T08:57:44.2167814Z * [new tag] v2.9.0-rc4 -> v2.9.0-rc4 2025-12-04T08:57:44.2168821Z * [new tag] v2.9.0-rc5 -> v2.9.0-rc5 2025-12-04T08:57:44.2170014Z * [new tag] v2.9.0-rc6 -> v2.9.0-rc6 2025-12-04T08:57:44.2170948Z * [new tag] v2.9.0-rc7 -> v2.9.0-rc7 2025-12-04T08:57:44.2172122Z * [new tag] v2.9.0-rc8 -> v2.9.0-rc8 2025-12-04T08:57:44.2172821Z * [new tag] v2.9.0-rc9 -> v2.9.0-rc9 2025-12-04T08:57:44.2173807Z * [new tag] v2.9.1 -> v2.9.1 2025-12-04T08:57:44.2174911Z * [new tag] v2.9.1-rc1 -> v2.9.1-rc1 2025-12-04T08:57:44.2175893Z * [new tag] v2.9.1-rc2 -> v2.9.1-rc2 2025-12-04T08:57:44.2177174Z * [new tag] viable/strict/1759343184 -> viable/strict/1759343184 2025-12-04T08:57:44.2178026Z * [new tag] viable/strict/1759346540 -> viable/strict/1759346540 2025-12-04T08:57:44.2179163Z * [new tag] viable/strict/1759348181 -> viable/strict/1759348181 2025-12-04T08:57:44.2180108Z * [new tag] viable/strict/1759350324 -> viable/strict/1759350324 2025-12-04T08:57:44.2180940Z * [new tag] viable/strict/1759351793 -> viable/strict/1759351793 2025-12-04T08:57:44.2181893Z * [new tag] viable/strict/1759353844 -> viable/strict/1759353844 2025-12-04T08:57:44.2182697Z * [new tag] viable/strict/1759355374 -> viable/strict/1759355374 2025-12-04T08:57:44.2183688Z * [new tag] viable/strict/1759357472 -> viable/strict/1759357472 2025-12-04T08:57:44.2184445Z * [new tag] viable/strict/1759361002 -> viable/strict/1759361002 2025-12-04T08:57:44.2185744Z * [new tag] viable/strict/1759362585 -> viable/strict/1759362585 2025-12-04T08:57:44.2186830Z * [new tag] viable/strict/1759365359 -> viable/strict/1759365359 2025-12-04T08:57:44.2187813Z * [new tag] viable/strict/1759370089 -> viable/strict/1759370089 2025-12-04T08:57:44.2188767Z * [new tag] viable/strict/1759377554 -> viable/strict/1759377554 2025-12-04T08:57:44.2189705Z * [new tag] viable/strict/1759379133 -> viable/strict/1759379133 2025-12-04T08:57:44.2190553Z * [new tag] viable/strict/1759389871 -> viable/strict/1759389871 2025-12-04T08:57:44.2191679Z * [new tag] viable/strict/1759393562 -> viable/strict/1759393562 2025-12-04T08:57:44.2192600Z * [new tag] viable/strict/1759395076 -> viable/strict/1759395076 2025-12-04T08:57:44.2193554Z * [new tag] viable/strict/1759398579 -> viable/strict/1759398579 2025-12-04T08:57:44.2194496Z * [new tag] viable/strict/1759404142 -> viable/strict/1759404142 2025-12-04T08:57:44.2195268Z * [new tag] viable/strict/1759405773 -> viable/strict/1759405773 2025-12-04T08:57:44.2196202Z * [new tag] viable/strict/1759408041 -> viable/strict/1759408041 2025-12-04T08:57:44.2197028Z * [new tag] viable/strict/1759411593 -> viable/strict/1759411593 2025-12-04T08:57:44.2197945Z * [new tag] viable/strict/1759427395 -> viable/strict/1759427395 2025-12-04T08:57:44.2198773Z * [new tag] viable/strict/1759434582 -> viable/strict/1759434582 2025-12-04T08:57:44.2199783Z * [new tag] viable/strict/1759436720 -> viable/strict/1759436720 2025-12-04T08:57:44.2200607Z * [new tag] viable/strict/1759440219 -> viable/strict/1759440219 2025-12-04T08:57:44.2201591Z * [new tag] viable/strict/1759441948 -> viable/strict/1759441948 2025-12-04T08:57:44.2202524Z * [new tag] viable/strict/1759443860 -> viable/strict/1759443860 2025-12-04T08:57:44.2203279Z * [new tag] viable/strict/1759445377 -> viable/strict/1759445377 2025-12-04T08:57:44.2204326Z * [new tag] viable/strict/1759447415 -> viable/strict/1759447415 2025-12-04T08:57:44.2205132Z * [new tag] viable/strict/1759451750 -> viable/strict/1759451750 2025-12-04T08:57:44.2206089Z * [new tag] viable/strict/1759453910 -> viable/strict/1759453910 2025-12-04T08:57:44.2207002Z * [new tag] viable/strict/1759456483 -> viable/strict/1759456483 2025-12-04T08:57:44.2208036Z * [new tag] viable/strict/1759459279 -> viable/strict/1759459279 2025-12-04T08:57:44.2208995Z * [new tag] viable/strict/1759460742 -> viable/strict/1759460742 2025-12-04T08:57:44.2209906Z * [new tag] viable/strict/1759462025 -> viable/strict/1759462025 2025-12-04T08:57:44.2210815Z * [new tag] viable/strict/1759469086 -> viable/strict/1759469086 2025-12-04T08:57:44.2211636Z * [new tag] viable/strict/1759470581 -> viable/strict/1759470581 2025-12-04T08:57:44.2212625Z * [new tag] viable/strict/1759472786 -> viable/strict/1759472786 2025-12-04T08:57:44.2213833Z * [new tag] viable/strict/1759476294 -> viable/strict/1759476294 2025-12-04T08:57:44.2214764Z * [new tag] viable/strict/1759479963 -> viable/strict/1759479963 2025-12-04T08:57:44.2215826Z * [new tag] viable/strict/1759492177 -> viable/strict/1759492177 2025-12-04T08:57:44.2216750Z * [new tag] viable/strict/1759519278 -> viable/strict/1759519278 2025-12-04T08:57:44.2217590Z * [new tag] viable/strict/1759524580 -> viable/strict/1759524580 2025-12-04T08:57:44.2218542Z * [new tag] viable/strict/1759528193 -> viable/strict/1759528193 2025-12-04T08:57:44.2219718Z * [new tag] viable/strict/1759533797 -> viable/strict/1759533797 2025-12-04T08:57:44.2220651Z * [new tag] viable/strict/1759542780 -> viable/strict/1759542780 2025-12-04T08:57:44.2221604Z * [new tag] viable/strict/1759549779 -> viable/strict/1759549779 2025-12-04T08:57:44.2222523Z * [new tag] viable/strict/1759555455 -> viable/strict/1759555455 2025-12-04T08:57:44.2223442Z * [new tag] viable/strict/1759559176 -> viable/strict/1759559176 2025-12-04T08:57:44.2224429Z * [new tag] viable/strict/1759560629 -> viable/strict/1759560629 2025-12-04T08:57:44.2225277Z * [new tag] viable/strict/1759569848 -> viable/strict/1759569848 2025-12-04T08:57:44.2226623Z * [new tag] viable/strict/1759571382 -> viable/strict/1759571382 2025-12-04T08:57:44.2227443Z * [new tag] viable/strict/1759573474 -> viable/strict/1759573474 2025-12-04T08:57:44.2228369Z * [new tag] viable/strict/1759618187 -> viable/strict/1759618187 2025-12-04T08:57:44.2229355Z * [new tag] viable/strict/1759626742 -> viable/strict/1759626742 2025-12-04T08:57:44.2230678Z * [new tag] viable/strict/1759632427 -> viable/strict/1759632427 2025-12-04T08:57:44.2231512Z * [new tag] viable/strict/1759634971 -> viable/strict/1759634971 2025-12-04T08:57:44.2232507Z * [new tag] viable/strict/1759661382 -> viable/strict/1759661382 2025-12-04T08:57:44.2233501Z * [new tag] viable/strict/1759663294 -> viable/strict/1759663294 2025-12-04T08:57:44.2234169Z * [new tag] viable/strict/1759708178 -> viable/strict/1759708178 2025-12-04T08:57:44.2235112Z * [new tag] viable/strict/1759715695 -> viable/strict/1759715695 2025-12-04T08:57:44.2236084Z * [new tag] viable/strict/1759728293 -> viable/strict/1759728293 2025-12-04T08:57:44.2236876Z * [new tag] viable/strict/1759735513 -> viable/strict/1759735513 2025-12-04T08:57:44.2237833Z * [new tag] viable/strict/1759739177 -> viable/strict/1759739177 2025-12-04T08:57:44.2238753Z * [new tag] viable/strict/1759758635 -> viable/strict/1759758635 2025-12-04T08:57:44.2239681Z * [new tag] viable/strict/1759765784 -> viable/strict/1759765784 2025-12-04T08:57:44.2240579Z * [new tag] viable/strict/1759767948 -> viable/strict/1759767948 2025-12-04T08:57:44.2241480Z * [new tag] viable/strict/1759771461 -> viable/strict/1759771461 2025-12-04T08:57:44.2242185Z * [new tag] viable/strict/1759776706 -> viable/strict/1759776706 2025-12-04T08:57:44.2243146Z * [new tag] viable/strict/1759782317 -> viable/strict/1759782317 2025-12-04T08:57:44.2244143Z * [new tag] viable/strict/1759783777 -> viable/strict/1759783777 2025-12-04T08:57:44.2245077Z * [new tag] viable/strict/1759785815 -> viable/strict/1759785815 2025-12-04T08:57:44.2246088Z * [new tag] viable/strict/1759789459 -> viable/strict/1759789459 2025-12-04T08:57:44.2247015Z * [new tag] viable/strict/1759790974 -> viable/strict/1759790974 2025-12-04T08:57:44.2247717Z * [new tag] viable/strict/1759794583 -> viable/strict/1759794583 2025-12-04T08:57:44.2248655Z * [new tag] viable/strict/1759797408 -> viable/strict/1759797408 2025-12-04T08:57:44.2249570Z * [new tag] viable/strict/1759799518 -> viable/strict/1759799518 2025-12-04T08:57:44.2250462Z * [new tag] viable/strict/1759804909 -> viable/strict/1759804909 2025-12-04T08:57:44.2251367Z * [new tag] viable/strict/1759807643 -> viable/strict/1759807643 2025-12-04T08:57:44.2252270Z * [new tag] viable/strict/1759809089 -> viable/strict/1759809089 2025-12-04T08:57:44.2253255Z * [new tag] viable/strict/1759811145 -> viable/strict/1759811145 2025-12-04T08:57:44.2254440Z * [new tag] viable/strict/1759812581 -> viable/strict/1759812581 2025-12-04T08:57:44.2255438Z * [new tag] viable/strict/1759814683 -> viable/strict/1759814683 2025-12-04T08:57:44.2256364Z * [new tag] viable/strict/1759821889 -> viable/strict/1759821889 2025-12-04T08:57:44.2257325Z * [new tag] viable/strict/1759823376 -> viable/strict/1759823376 2025-12-04T08:57:44.2258277Z * [new tag] viable/strict/1759827107 -> viable/strict/1759827107 2025-12-04T08:57:44.2259128Z * [new tag] viable/strict/1759830577 -> viable/strict/1759830577 2025-12-04T08:57:44.2260190Z * [new tag] viable/strict/1759832720 -> viable/strict/1759832720 2025-12-04T08:57:44.2261018Z * [new tag] viable/strict/1759842063 -> viable/strict/1759842063 2025-12-04T08:57:44.2261982Z * [new tag] viable/strict/1759847121 -> viable/strict/1759847121 2025-12-04T08:57:44.2263236Z * [new tag] viable/strict/1759850721 -> viable/strict/1759850721 2025-12-04T08:57:44.2264065Z * [new tag] viable/strict/1759857870 -> viable/strict/1759857870 2025-12-04T08:57:44.2265114Z * [new tag] viable/strict/1759863143 -> viable/strict/1759863143 2025-12-04T08:57:44.2266060Z * [new tag] viable/strict/1759875874 -> viable/strict/1759875874 2025-12-04T08:57:44.2266840Z * [new tag] viable/strict/1759877385 -> viable/strict/1759877385 2025-12-04T08:57:44.2267786Z * [new tag] viable/strict/1759883801 -> viable/strict/1759883801 2025-12-04T08:57:44.2268621Z * [new tag] viable/strict/1759885922 -> viable/strict/1759885922 2025-12-04T08:57:44.2269696Z * [new tag] viable/strict/1759888488 -> viable/strict/1759888488 2025-12-04T08:57:44.2270399Z * [new tag] viable/strict/1759895471 -> viable/strict/1759895471 2025-12-04T08:57:44.2271400Z * [new tag] viable/strict/1759904803 -> viable/strict/1759904803 2025-12-04T08:57:44.2272508Z * [new tag] viable/strict/1759908300 -> viable/strict/1759908300 2025-12-04T08:57:44.2273459Z * [new tag] viable/strict/1759915520 -> viable/strict/1759915520 2025-12-04T08:57:44.2274275Z * [new tag] viable/strict/1759916978 -> viable/strict/1759916978 2025-12-04T08:57:44.2275052Z * [new tag] viable/strict/1759930024 -> viable/strict/1759930024 2025-12-04T08:57:44.2276121Z * [new tag] viable/strict/1759948122 -> viable/strict/1759948122 2025-12-04T08:57:44.2277082Z * [new tag] viable/strict/1759952983 -> viable/strict/1759952983 2025-12-04T08:57:44.2278061Z * [new tag] viable/strict/1759955121 -> viable/strict/1759955121 2025-12-04T08:57:44.2279198Z * [new tag] viable/strict/1759962298 -> viable/strict/1759962298 2025-12-04T08:57:44.2283016Z * [new tag] viable/strict/1759965837 -> viable/strict/1759965837 2025-12-04T08:57:44.2284051Z * [new tag] viable/strict/1759970213 -> viable/strict/1759970213 2025-12-04T08:57:44.2284912Z * [new tag] viable/strict/1759974894 -> viable/strict/1759974894 2025-12-04T08:57:44.2285866Z * [new tag] viable/strict/1759977763 -> viable/strict/1759977763 2025-12-04T08:57:44.2286898Z * [new tag] viable/strict/1759979241 -> viable/strict/1759979241 2025-12-04T08:57:44.2287865Z * [new tag] viable/strict/1759985417 -> viable/strict/1759985417 2025-12-04T08:57:44.2288694Z * [new tag] viable/strict/1759987490 -> viable/strict/1759987490 2025-12-04T08:57:44.2289698Z * [new tag] viable/strict/1759996180 -> viable/strict/1759996180 2025-12-04T08:57:44.2290669Z * [new tag] viable/strict/1760065682 -> viable/strict/1760065682 2025-12-04T08:57:44.2291737Z * [new tag] viable/strict/1760066894 -> viable/strict/1760066894 2025-12-04T08:57:44.2292653Z * [new tag] viable/strict/1760070345 -> viable/strict/1760070345 2025-12-04T08:57:44.2294335Z * [new tag] viable/strict/1760089782 -> viable/strict/1760089782 2025-12-04T08:57:44.2295268Z * [new tag] viable/strict/1760091921 -> viable/strict/1760091921 2025-12-04T08:57:44.2296330Z * [new tag] viable/strict/1760127924 -> viable/strict/1760127924 2025-12-04T08:57:44.2297171Z * [new tag] viable/strict/1760129489 -> viable/strict/1760129489 2025-12-04T08:57:44.2298242Z * [new tag] viable/strict/1760132980 -> viable/strict/1760132980 2025-12-04T08:57:44.2299215Z * [new tag] viable/strict/1760135060 -> viable/strict/1760135060 2025-12-04T08:57:44.2300206Z * [new tag] viable/strict/1760215782 -> viable/strict/1760215782 2025-12-04T08:57:44.2301156Z * [new tag] viable/strict/1760273849 -> viable/strict/1760273849 2025-12-04T08:57:44.2302015Z * [new tag] viable/strict/1760275517 -> viable/strict/1760275517 2025-12-04T08:57:44.2302995Z * [new tag] viable/strict/1760276979 -> viable/strict/1760276979 2025-12-04T08:57:44.2303978Z * [new tag] viable/strict/1760279007 -> viable/strict/1760279007 2025-12-04T08:57:44.2304667Z * [new tag] viable/strict/1760286328 -> viable/strict/1760286328 2025-12-04T08:57:44.2305564Z * [new tag] viable/strict/1760493304 -> viable/strict/1760493304 2025-12-04T08:57:44.2306526Z * [new tag] viable/strict/1760496298 -> viable/strict/1760496298 2025-12-04T08:57:44.2307581Z * [new tag] viable/strict/1760518396 -> viable/strict/1760518396 2025-12-04T08:57:44.2308320Z * [new tag] viable/strict/1760534864 -> viable/strict/1760534864 2025-12-04T08:57:44.2309225Z * [new tag] viable/strict/1760549062 -> viable/strict/1760549062 2025-12-04T08:57:44.2310363Z * [new tag] viable/strict/1760552799 -> viable/strict/1760552799 2025-12-04T08:57:44.2311292Z * [new tag] viable/strict/1760554355 -> viable/strict/1760554355 2025-12-04T08:57:44.2312253Z * [new tag] viable/strict/1760556275 -> viable/strict/1760556275 2025-12-04T08:57:44.2313183Z * [new tag] viable/strict/1760564979 -> viable/strict/1760564979 2025-12-04T08:57:44.2314118Z * [new tag] viable/strict/1760567049 -> viable/strict/1760567049 2025-12-04T08:57:44.2315502Z * [new tag] viable/strict/1760568585 -> viable/strict/1760568585 2025-12-04T08:57:44.2316437Z * [new tag] viable/strict/1760570630 -> viable/strict/1760570630 2025-12-04T08:57:44.2317238Z * [new tag] viable/strict/1760572180 -> viable/strict/1760572180 2025-12-04T08:57:44.2318190Z * [new tag] viable/strict/1760575094 -> viable/strict/1760575094 2025-12-04T08:57:44.2319224Z * [new tag] viable/strict/1760579709 -> viable/strict/1760579709 2025-12-04T08:57:44.2320586Z * [new tag] viable/strict/1760582614 -> viable/strict/1760582614 2025-12-04T08:57:44.2321537Z * [new tag] viable/strict/1760586815 -> viable/strict/1760586815 2025-12-04T08:57:44.2322277Z * [new tag] viable/strict/1760588829 -> viable/strict/1760588829 2025-12-04T08:57:44.2323216Z * [new tag] viable/strict/1760590200 -> viable/strict/1760590200 2025-12-04T08:57:44.2324190Z * [new tag] viable/strict/1760592311 -> viable/strict/1760592311 2025-12-04T08:57:44.2325249Z * [new tag] viable/strict/1760619733 -> viable/strict/1760619733 2025-12-04T08:57:44.2325994Z * [new tag] viable/strict/1760628335 -> viable/strict/1760628335 2025-12-04T08:57:44.2326910Z * [new tag] viable/strict/1760635490 -> viable/strict/1760635490 2025-12-04T08:57:44.2327724Z * [new tag] viable/strict/1760640743 -> viable/strict/1760640743 2025-12-04T08:57:44.2328673Z * [new tag] viable/strict/1760642528 -> viable/strict/1760642528 2025-12-04T08:57:44.2329581Z * [new tag] viable/strict/1760646330 -> viable/strict/1760646330 2025-12-04T08:57:44.2330378Z * [new tag] viable/strict/1760666101 -> viable/strict/1760666101 2025-12-04T08:57:44.2331385Z * [new tag] viable/strict/1760668990 -> viable/strict/1760668990 2025-12-04T08:57:44.2332211Z * [new tag] viable/strict/1760670600 -> viable/strict/1760670600 2025-12-04T08:57:44.2333266Z * [new tag] viable/strict/1760671704 -> viable/strict/1760671704 2025-12-04T08:57:44.2334427Z * [new tag] viable/strict/1760673121 -> viable/strict/1760673121 2025-12-04T08:57:44.2335293Z * [new tag] viable/strict/1760675352 -> viable/strict/1760675352 2025-12-04T08:57:44.2336331Z * [new tag] viable/strict/1760696731 -> viable/strict/1760696731 2025-12-04T08:57:44.2338854Z * [new tag] viable/strict/1760723515 -> viable/strict/1760723515 2025-12-04T08:57:44.2339710Z * [new tag] viable/strict/1760727234 -> viable/strict/1760727234 2025-12-04T08:57:44.2340732Z * [new tag] viable/strict/1760730578 -> viable/strict/1760730578 2025-12-04T08:57:44.2341684Z * [new tag] viable/strict/1760732726 -> viable/strict/1760732726 2025-12-04T08:57:44.2342735Z * [new tag] viable/strict/1760734180 -> viable/strict/1760734180 2025-12-04T08:57:44.2343755Z * [new tag] viable/strict/1760736251 -> viable/strict/1760736251 2025-12-04T08:57:44.2344593Z * [new tag] viable/strict/1760737772 -> viable/strict/1760737772 2025-12-04T08:57:44.2345718Z * [new tag] viable/strict/1760758005 -> viable/strict/1760758005 2025-12-04T08:57:44.2346638Z * [new tag] viable/strict/1760761532 -> viable/strict/1760761532 2025-12-04T08:57:44.2347532Z * [new tag] viable/strict/1760802581 -> viable/strict/1760802581 2025-12-04T08:57:44.2348374Z * [new tag] viable/strict/1760827772 -> viable/strict/1760827772 2025-12-04T08:57:44.2349302Z * [new tag] viable/strict/1760834524 -> viable/strict/1760834524 2025-12-04T08:57:44.2350347Z * [new tag] viable/strict/1760845009 -> viable/strict/1760845009 2025-12-04T08:57:44.2351277Z * [new tag] viable/strict/1760876836 -> viable/strict/1760876836 2025-12-04T08:57:44.2352182Z * [new tag] viable/strict/1760880329 -> viable/strict/1760880329 2025-12-04T08:57:44.2353016Z * [new tag] viable/strict/1760888987 -> viable/strict/1760888987 2025-12-04T08:57:44.2353945Z * [new tag] viable/strict/1760912664 -> viable/strict/1760912664 2025-12-04T08:57:44.2354866Z * [new tag] viable/strict/1760925321 -> viable/strict/1760925321 2025-12-04T08:57:44.2355657Z * [new tag] viable/strict/1760931488 -> viable/strict/1760931488 2025-12-04T08:57:44.2357052Z * [new tag] viable/strict/1760932693 -> viable/strict/1760932693 2025-12-04T08:57:44.2358014Z * [new tag] viable/strict/1761004184 -> viable/strict/1761004184 2025-12-04T08:57:44.2358840Z * [new tag] viable/strict/1761014748 -> viable/strict/1761014748 2025-12-04T08:57:44.2359799Z * [new tag] viable/strict/1761017491 -> viable/strict/1761017491 2025-12-04T08:57:44.2360738Z * [new tag] viable/strict/1761018806 -> viable/strict/1761018806 2025-12-04T08:57:44.2361764Z * [new tag] viable/strict/1761020754 -> viable/strict/1761020754 2025-12-04T08:57:44.2362588Z * [new tag] viable/strict/1761024303 -> viable/strict/1761024303 2025-12-04T08:57:44.2363513Z * [new tag] viable/strict/1761029582 -> viable/strict/1761029582 2025-12-04T08:57:44.2364434Z * [new tag] viable/strict/1761031535 -> viable/strict/1761031535 2025-12-04T08:57:44.2365255Z * [new tag] viable/strict/1761035196 -> viable/strict/1761035196 2025-12-04T08:57:44.2366346Z * [new tag] viable/strict/1761045825 -> viable/strict/1761045825 2025-12-04T08:57:44.2367284Z * [new tag] viable/strict/1761054796 -> viable/strict/1761054796 2025-12-04T08:57:44.2368207Z * [new tag] viable/strict/1761060314 -> viable/strict/1761060314 2025-12-04T08:57:44.2369135Z * [new tag] viable/strict/1761071198 -> viable/strict/1761071198 2025-12-04T08:57:44.2370069Z * [new tag] viable/strict/1761074628 -> viable/strict/1761074628 2025-12-04T08:57:44.2371049Z * [new tag] viable/strict/1761078351 -> viable/strict/1761078351 2025-12-04T08:57:44.2371952Z * [new tag] viable/strict/1761079822 -> viable/strict/1761079822 2025-12-04T08:57:44.2372937Z * [new tag] viable/strict/1761081873 -> viable/strict/1761081873 2025-12-04T08:57:44.2374292Z * [new tag] viable/strict/1761083392 -> viable/strict/1761083392 2025-12-04T08:57:44.2375249Z * [new tag] viable/strict/1761085465 -> viable/strict/1761085465 2025-12-04T08:57:44.2376202Z * [new tag] viable/strict/1761089099 -> viable/strict/1761089099 2025-12-04T08:57:44.2377139Z * [new tag] viable/strict/1761095535 -> viable/strict/1761095535 2025-12-04T08:57:44.2378055Z * [new tag] viable/strict/1761098119 -> viable/strict/1761098119 2025-12-04T08:57:44.2379775Z * [new tag] viable/strict/1761101330 -> viable/strict/1761101330 2025-12-04T08:57:44.2380704Z * [new tag] viable/strict/1761114425 -> viable/strict/1761114425 2025-12-04T08:57:44.2381638Z * [new tag] viable/strict/1761116036 -> viable/strict/1761116036 2025-12-04T08:57:44.2382591Z * [new tag] viable/strict/1761119379 -> viable/strict/1761119379 2025-12-04T08:57:44.2383531Z * [new tag] viable/strict/1761121601 -> viable/strict/1761121601 2025-12-04T08:57:44.2384470Z * [new tag] viable/strict/1761123234 -> viable/strict/1761123234 2025-12-04T08:57:44.2385327Z * [new tag] viable/strict/1761126621 -> viable/strict/1761126621 2025-12-04T08:57:44.2386267Z * [new tag] viable/strict/1761132259 -> viable/strict/1761132259 2025-12-04T08:57:44.2387273Z * [new tag] viable/strict/1761146746 -> viable/strict/1761146746 2025-12-04T08:57:44.2388207Z * [new tag] viable/strict/1761164752 -> viable/strict/1761164752 2025-12-04T08:57:44.2389139Z * [new tag] viable/strict/1761166198 -> viable/strict/1761166198 2025-12-04T08:57:44.2390131Z * [new tag] viable/strict/1761175424 -> viable/strict/1761175424 2025-12-04T08:57:44.2391154Z * [new tag] viable/strict/1761176983 -> viable/strict/1761176983 2025-12-04T08:57:44.2392211Z * [new tag] viable/strict/1761179891 -> viable/strict/1761179891 2025-12-04T08:57:44.2393117Z * [new tag] viable/strict/1761181930 -> viable/strict/1761181930 2025-12-04T08:57:44.2394054Z * [new tag] viable/strict/1761184516 -> viable/strict/1761184516 2025-12-04T08:57:44.2395086Z * [new tag] viable/strict/1761190179 -> viable/strict/1761190179 2025-12-04T08:57:44.2396012Z * [new tag] viable/strict/1761193558 -> viable/strict/1761193558 2025-12-04T08:57:44.2396933Z * [new tag] viable/strict/1761207990 -> viable/strict/1761207990 2025-12-04T08:57:44.2397861Z * [new tag] viable/strict/1761229539 -> viable/strict/1761229539 2025-12-04T08:57:44.2399002Z * [new tag] viable/strict/1761244031 -> viable/strict/1761244031 2025-12-04T08:57:44.2399966Z * [new tag] viable/strict/1761248986 -> viable/strict/1761248986 2025-12-04T08:57:44.2400791Z * [new tag] viable/strict/1761259791 -> viable/strict/1761259791 2025-12-04T08:57:44.2401714Z * [new tag] viable/strict/1761266139 -> viable/strict/1761266139 2025-12-04T08:57:44.2402663Z * [new tag] viable/strict/1761268316 -> viable/strict/1761268316 2025-12-04T08:57:44.2403576Z * [new tag] viable/strict/1761273805 -> viable/strict/1761273805 2025-12-04T08:57:44.2404573Z * [new tag] viable/strict/1761275261 -> viable/strict/1761275261 2025-12-04T08:57:44.2405562Z * [new tag] viable/strict/1761277913 -> viable/strict/1761277913 2025-12-04T08:57:44.2406543Z * [new tag] viable/strict/1761290701 -> viable/strict/1761290701 2025-12-04T08:57:44.2407469Z * [new tag] viable/strict/1761294396 -> viable/strict/1761294396 2025-12-04T08:57:44.2408371Z * [new tag] viable/strict/1761303047 -> viable/strict/1761303047 2025-12-04T08:57:44.2409311Z * [new tag] viable/strict/1761335388 -> viable/strict/1761335388 2025-12-04T08:57:44.2410239Z * [new tag] viable/strict/1761337551 -> viable/strict/1761337551 2025-12-04T08:57:44.2411066Z * [new tag] viable/strict/1761339007 -> viable/strict/1761339007 2025-12-04T08:57:44.2412014Z * [new tag] viable/strict/1761341050 -> viable/strict/1761341050 2025-12-04T08:57:44.2413099Z * [new tag] viable/strict/1761346188 -> viable/strict/1761346188 2025-12-04T08:57:44.2414439Z * [new tag] viable/strict/1761349792 -> viable/strict/1761349792 2025-12-04T08:57:44.2415399Z * [new tag] viable/strict/1761352620 -> viable/strict/1761352620 2025-12-04T08:57:44.2416248Z * [new tag] viable/strict/1761354730 -> viable/strict/1761354730 2025-12-04T08:57:44.2417285Z * [new tag] viable/strict/1761357298 -> viable/strict/1761357298 2025-12-04T08:57:44.2418242Z * [new tag] viable/strict/1761360201 -> viable/strict/1761360201 2025-12-04T08:57:44.2419670Z * [new tag] viable/strict/1761361753 -> viable/strict/1761361753 2025-12-04T08:57:44.2420600Z * [new tag] viable/strict/1761364351 -> viable/strict/1761364351 2025-12-04T08:57:44.2421533Z * [new tag] viable/strict/1761366338 -> viable/strict/1761366338 2025-12-04T08:57:44.2422619Z * [new tag] viable/strict/1761367802 -> viable/strict/1761367802 2025-12-04T08:57:44.2423558Z * [new tag] viable/strict/1761369889 -> viable/strict/1761369889 2025-12-04T08:57:44.2424705Z * [new tag] viable/strict/1761371385 -> viable/strict/1761371385 2025-12-04T08:57:44.2425779Z * [new tag] viable/strict/1761373581 -> viable/strict/1761373581 2025-12-04T08:57:44.2426844Z * [new tag] viable/strict/1761375054 -> viable/strict/1761375054 2025-12-04T08:57:44.2427768Z * [new tag] viable/strict/1761421785 -> viable/strict/1761421785 2025-12-04T08:57:44.2428765Z * [new tag] viable/strict/1761434614 -> viable/strict/1761434614 2025-12-04T08:57:44.2430008Z * [new tag] viable/strict/1761439254 -> viable/strict/1761439254 2025-12-04T08:57:44.2430997Z * [new tag] viable/strict/1761454187 -> viable/strict/1761454187 2025-12-04T08:57:44.2432005Z * [new tag] viable/strict/1761459991 -> viable/strict/1761459991 2025-12-04T08:57:44.2433085Z * [new tag] viable/strict/1761470668 -> viable/strict/1761470668 2025-12-04T08:57:44.2434396Z * [new tag] viable/strict/1761472188 -> viable/strict/1761472188 2025-12-04T08:57:44.2435363Z * [new tag] viable/strict/1761503178 -> viable/strict/1761503178 2025-12-04T08:57:44.2436282Z * [new tag] viable/strict/1761517492 -> viable/strict/1761517492 2025-12-04T08:57:44.2437350Z * [new tag] viable/strict/1761518981 -> viable/strict/1761518981 2025-12-04T08:57:44.2438327Z * [new tag] viable/strict/1761533609 -> viable/strict/1761533609 2025-12-04T08:57:44.2439075Z * [new tag] viable/strict/1761546438 -> viable/strict/1761546438 2025-12-04T08:57:44.2440130Z * [new tag] viable/strict/1761548133 -> viable/strict/1761548133 2025-12-04T08:57:44.2441317Z * [new tag] viable/strict/1761555186 -> viable/strict/1761555186 2025-12-04T08:57:44.2442349Z * [new tag] viable/strict/1761557178 -> viable/strict/1761557178 2025-12-04T08:57:44.2443277Z * [new tag] viable/strict/1761560772 -> viable/strict/1761560772 2025-12-04T08:57:44.2444216Z * [new tag] viable/strict/1761562266 -> viable/strict/1761562266 2025-12-04T08:57:44.2445234Z * [new tag] viable/strict/1761564260 -> viable/strict/1761564260 2025-12-04T08:57:44.2446147Z * [new tag] viable/strict/1761568072 -> viable/strict/1761568072 2025-12-04T08:57:44.2447135Z * [new tag] viable/strict/1761571683 -> viable/strict/1761571683 2025-12-04T08:57:44.2447854Z * [new tag] viable/strict/1761580199 -> viable/strict/1761580199 2025-12-04T08:57:44.2448863Z * [new tag] viable/strict/1761587383 -> viable/strict/1761587383 2025-12-04T08:57:44.2449901Z * [new tag] viable/strict/1761591165 -> viable/strict/1761591165 2025-12-04T08:57:44.2450739Z * [new tag] viable/strict/1761594575 -> viable/strict/1761594575 2025-12-04T08:57:44.2451680Z * [new tag] viable/strict/1761596710 -> viable/strict/1761596710 2025-12-04T08:57:44.2452606Z * [new tag] viable/strict/1761598189 -> viable/strict/1761598189 2025-12-04T08:57:44.2453841Z * [new tag] viable/strict/1761600254 -> viable/strict/1761600254 2025-12-04T08:57:44.2454870Z * [new tag] viable/strict/1761603879 -> viable/strict/1761603879 2025-12-04T08:57:44.2455829Z * [new tag] viable/strict/1761605429 -> viable/strict/1761605429 2025-12-04T08:57:44.2456861Z * [new tag] viable/strict/1761607468 -> viable/strict/1761607468 2025-12-04T08:57:44.2457831Z * [new tag] viable/strict/1761608983 -> viable/strict/1761608983 2025-12-04T08:57:44.2458879Z * [new tag] viable/strict/1761611846 -> viable/strict/1761611846 2025-12-04T08:57:44.2459859Z * [new tag] viable/strict/1761613922 -> viable/strict/1761613922 2025-12-04T08:57:44.2460616Z * [new tag] viable/strict/1761616504 -> viable/strict/1761616504 2025-12-04T08:57:44.2461412Z * [new tag] viable/strict/1761619599 -> viable/strict/1761619599 2025-12-04T08:57:44.2462439Z * [new tag] viable/strict/1761686693 -> viable/strict/1761686693 2025-12-04T08:57:44.2463289Z * [new tag] viable/strict/1761688179 -> viable/strict/1761688179 2025-12-04T08:57:44.2464283Z * [new tag] viable/strict/1761691973 -> viable/strict/1761691973 2025-12-04T08:57:44.2465494Z * [new tag] viable/strict/1761693884 -> viable/strict/1761693884 2025-12-04T08:57:44.2466442Z * [new tag] viable/strict/1761695389 -> viable/strict/1761695389 2025-12-04T08:57:44.2467385Z * [new tag] viable/strict/1761698408 -> viable/strict/1761698408 2025-12-04T08:57:44.2468407Z * [new tag] viable/strict/1761702931 -> viable/strict/1761702931 2025-12-04T08:57:44.2469359Z * [new tag] viable/strict/1761706307 -> viable/strict/1761706307 2025-12-04T08:57:44.2470299Z * [new tag] viable/strict/1761709065 -> viable/strict/1761709065 2025-12-04T08:57:44.2471349Z * [new tag] viable/strict/1761710285 -> viable/strict/1761710285 2025-12-04T08:57:44.2472289Z * [new tag] viable/strict/1761711983 -> viable/strict/1761711983 2025-12-04T08:57:44.2473275Z * [new tag] viable/strict/1761713514 -> viable/strict/1761713514 2025-12-04T08:57:44.2474330Z * [new tag] viable/strict/1761715523 -> viable/strict/1761715523 2025-12-04T08:57:44.2475343Z * [new tag] viable/strict/1761727973 -> viable/strict/1761727973 2025-12-04T08:57:44.2476339Z * [new tag] viable/strict/1761751558 -> viable/strict/1761751558 2025-12-04T08:57:44.2477294Z * [new tag] viable/strict/1761755187 -> viable/strict/1761755187 2025-12-04T08:57:44.2478282Z * [new tag] viable/strict/1761756826 -> viable/strict/1761756826 2025-12-04T08:57:44.2479702Z * [new tag] viable/strict/1761769551 -> viable/strict/1761769551 2025-12-04T08:57:44.2480769Z * [new tag] viable/strict/1761771032 -> viable/strict/1761771032 2025-12-04T08:57:44.2481574Z * [new tag] viable/strict/1761773101 -> viable/strict/1761773101 2025-12-04T08:57:44.2482674Z * [new tag] viable/strict/1761781792 -> viable/strict/1761781792 2025-12-04T08:57:44.2484129Z * [new tag] viable/strict/1761784788 -> viable/strict/1761784788 2025-12-04T08:57:44.2485127Z * [new tag] viable/strict/1761786740 -> viable/strict/1761786740 2025-12-04T08:57:44.2486238Z * [new tag] viable/strict/1761789332 -> viable/strict/1761789332 2025-12-04T08:57:44.2487646Z * [new tag] viable/strict/1761792569 -> viable/strict/1761792569 2025-12-04T08:57:44.2488626Z * [new tag] viable/strict/1761795289 -> viable/strict/1761795289 2025-12-04T08:57:44.2489637Z * [new tag] viable/strict/1761798345 -> viable/strict/1761798345 2025-12-04T08:57:44.2490629Z * [new tag] viable/strict/1761799827 -> viable/strict/1761799827 2025-12-04T08:57:44.2491784Z * [new tag] viable/strict/1761805604 -> viable/strict/1761805604 2025-12-04T08:57:44.2492732Z * [new tag] viable/strict/1761807202 -> viable/strict/1761807202 2025-12-04T08:57:44.2494004Z * [new tag] viable/strict/1761809094 -> viable/strict/1761809094 2025-12-04T08:57:44.2495021Z * [new tag] viable/strict/1761810576 -> viable/strict/1761810576 2025-12-04T08:57:44.2496097Z * [new tag] viable/strict/1761812771 -> viable/strict/1761812771 2025-12-04T08:57:44.2497117Z * [new tag] viable/strict/1761814363 -> viable/strict/1761814363 2025-12-04T08:57:44.2498084Z * [new tag] viable/strict/1761857410 -> viable/strict/1761857410 2025-12-04T08:57:44.2499076Z * [new tag] viable/strict/1761860985 -> viable/strict/1761860985 2025-12-04T08:57:44.2500054Z * [new tag] viable/strict/1761863094 -> viable/strict/1761863094 2025-12-04T08:57:44.2501013Z * [new tag] viable/strict/1761864590 -> viable/strict/1761864590 2025-12-04T08:57:44.2502116Z * [new tag] viable/strict/1761866675 -> viable/strict/1761866675 2025-12-04T08:57:44.2503328Z * [new tag] viable/strict/1761868178 -> viable/strict/1761868178 2025-12-04T08:57:44.2504313Z * [new tag] viable/strict/1761871111 -> viable/strict/1761871111 2025-12-04T08:57:44.2505285Z * [new tag] viable/strict/1761873126 -> viable/strict/1761873126 2025-12-04T08:57:44.2506415Z * [new tag] viable/strict/1761875714 -> viable/strict/1761875714 2025-12-04T08:57:44.2507397Z * [new tag] viable/strict/1761878924 -> viable/strict/1761878924 2025-12-04T08:57:44.2508429Z * [new tag] viable/strict/1761881727 -> viable/strict/1761881727 2025-12-04T08:57:44.2509387Z * [new tag] viable/strict/1761882959 -> viable/strict/1761882959 2025-12-04T08:57:44.2510313Z * [new tag] viable/strict/1761886268 -> viable/strict/1761886268 2025-12-04T08:57:44.2511291Z * [new tag] viable/strict/1761893641 -> viable/strict/1761893641 2025-12-04T08:57:44.2512244Z * [new tag] viable/strict/1761931517 -> viable/strict/1761931517 2025-12-04T08:57:44.2513223Z * [new tag] viable/strict/1761933080 -> viable/strict/1761933080 2025-12-04T08:57:44.2514163Z * [new tag] viable/strict/1761935217 -> viable/strict/1761935217 2025-12-04T08:57:44.2515239Z * [new tag] viable/strict/1761938533 -> viable/strict/1761938533 2025-12-04T08:57:44.2516243Z * [new tag] viable/strict/1761940184 -> viable/strict/1761940184 2025-12-04T08:57:44.2517196Z * [new tag] viable/strict/1761942338 -> viable/strict/1761942338 2025-12-04T08:57:44.2518137Z * [new tag] viable/strict/1761946100 -> viable/strict/1761946100 2025-12-04T08:57:44.2519107Z * [new tag] viable/strict/1761947374 -> viable/strict/1761947374 2025-12-04T08:57:44.2520071Z * [new tag] viable/strict/1761950978 -> viable/strict/1761950978 2025-12-04T08:57:44.2521022Z * [new tag] viable/strict/1761957727 -> viable/strict/1761957727 2025-12-04T08:57:44.2521964Z * [new tag] viable/strict/1761959532 -> viable/strict/1761959532 2025-12-04T08:57:44.2523056Z * [new tag] viable/strict/1761965366 -> viable/strict/1761965366 2025-12-04T08:57:44.2524098Z * [new tag] viable/strict/1761968066 -> viable/strict/1761968066 2025-12-04T08:57:44.2525067Z * [new tag] viable/strict/1761969322 -> viable/strict/1761969322 2025-12-04T08:57:44.2526155Z * [new tag] viable/strict/1761974723 -> viable/strict/1761974723 2025-12-04T08:57:44.2527158Z * [new tag] viable/strict/1761981837 -> viable/strict/1761981837 2025-12-04T08:57:44.2528152Z * [new tag] viable/strict/1761985546 -> viable/strict/1761985546 2025-12-04T08:57:44.2529161Z * [new tag] viable/strict/1761987030 -> viable/strict/1761987030 2025-12-04T08:57:44.2530182Z * [new tag] viable/strict/1762003554 -> viable/strict/1762003554 2025-12-04T08:57:44.2531188Z * [new tag] viable/strict/1762021560 -> viable/strict/1762021560 2025-12-04T08:57:44.2532132Z * [new tag] viable/strict/1762032190 -> viable/strict/1762032190 2025-12-04T08:57:44.2533136Z * [new tag] viable/strict/1762040981 -> viable/strict/1762040981 2025-12-04T08:57:44.2534591Z * [new tag] viable/strict/1762048525 -> viable/strict/1762048525 2025-12-04T08:57:44.2535573Z * [new tag] viable/strict/1762104223 -> viable/strict/1762104223 2025-12-04T08:57:44.2536552Z * [new tag] viable/strict/1762105778 -> viable/strict/1762105778 2025-12-04T08:57:44.2537550Z * [new tag] viable/strict/1762115109 -> viable/strict/1762115109 2025-12-04T08:57:44.2538512Z * [new tag] viable/strict/1762125840 -> viable/strict/1762125840 2025-12-04T08:57:44.2539320Z * [new tag] viable/strict/1762127377 -> viable/strict/1762127377 2025-12-04T08:57:44.2540769Z * [new tag] viable/strict/1762134925 -> viable/strict/1762134925 2025-12-04T08:57:44.2541558Z * [new tag] viable/strict/1762138338 -> viable/strict/1762138338 2025-12-04T08:57:44.2542652Z * [new tag] viable/strict/1762148993 -> viable/strict/1762148993 2025-12-04T08:57:44.2543677Z * [new tag] viable/strict/1762152871 -> viable/strict/1762152871 2025-12-04T08:57:44.2544692Z * [new tag] viable/strict/1762156183 -> viable/strict/1762156183 2025-12-04T08:57:44.2545831Z * [new tag] viable/strict/1762163457 -> viable/strict/1762163457 2025-12-04T08:57:44.2546791Z * [new tag] viable/strict/1762165569 -> viable/strict/1762165569 2025-12-04T08:57:44.2547730Z * [new tag] viable/strict/1762169035 -> viable/strict/1762169035 2025-12-04T08:57:44.2548726Z * [new tag] viable/strict/1762174936 -> viable/strict/1762174936 2025-12-04T08:57:44.2550096Z * [new tag] viable/strict/1762194412 -> viable/strict/1762194412 2025-12-04T08:57:44.2551037Z * [new tag] viable/strict/1762195876 -> viable/strict/1762195876 2025-12-04T08:57:44.2552004Z * [new tag] viable/strict/1762197788 -> viable/strict/1762197788 2025-12-04T08:57:44.2553010Z * [new tag] viable/strict/1762199389 -> viable/strict/1762199389 2025-12-04T08:57:44.2554187Z * [new tag] viable/strict/1762206585 -> viable/strict/1762206585 2025-12-04T08:57:44.2555258Z * [new tag] viable/strict/1762210184 -> viable/strict/1762210184 2025-12-04T08:57:44.2556028Z * [new tag] viable/strict/1762218736 -> viable/strict/1762218736 2025-12-04T08:57:44.2557116Z * [new tag] viable/strict/1762224529 -> viable/strict/1762224529 2025-12-04T08:57:44.2558120Z * [new tag] viable/strict/1762227253 -> viable/strict/1762227253 2025-12-04T08:57:44.2558941Z * [new tag] viable/strict/1762228515 -> viable/strict/1762228515 2025-12-04T08:57:44.2560185Z * [new tag] viable/strict/1762230349 -> viable/strict/1762230349 2025-12-04T08:57:44.2561006Z * [new tag] viable/strict/1762231859 -> viable/strict/1762231859 2025-12-04T08:57:44.2562051Z * [new tag] viable/strict/1762233925 -> viable/strict/1762233925 2025-12-04T08:57:44.2563133Z * [new tag] viable/strict/1762237630 -> viable/strict/1762237630 2025-12-04T08:57:44.2563908Z * [new tag] viable/strict/1762253522 -> viable/strict/1762253522 2025-12-04T08:57:44.2565015Z * [new tag] viable/strict/1762278588 -> viable/strict/1762278588 2025-12-04T08:57:44.2566131Z * [new tag] viable/strict/1762284203 -> viable/strict/1762284203 2025-12-04T08:57:44.2567239Z * [new tag] viable/strict/1762289446 -> viable/strict/1762289446 2025-12-04T08:57:44.2586536Z * [new tag] viable/strict/1762291515 -> viable/strict/1762291515 2025-12-04T08:57:44.2586862Z * [new tag] viable/strict/1762295100 -> viable/strict/1762295100 2025-12-04T08:57:44.2587093Z * [new tag] viable/strict/1762296590 -> viable/strict/1762296590 2025-12-04T08:57:44.2587318Z * [new tag] viable/strict/1762300179 -> viable/strict/1762300179 2025-12-04T08:57:44.2587531Z * [new tag] viable/strict/1762303207 -> viable/strict/1762303207 2025-12-04T08:57:44.2587749Z * [new tag] viable/strict/1762386584 -> viable/strict/1762386584 2025-12-04T08:57:44.2587957Z * [new tag] viable/strict/1762391537 -> viable/strict/1762391537 2025-12-04T08:57:44.2588165Z * [new tag] viable/strict/1762394119 -> viable/strict/1762394119 2025-12-04T08:57:44.2588382Z * [new tag] viable/strict/1762397437 -> viable/strict/1762397437 2025-12-04T08:57:44.2588590Z * [new tag] viable/strict/1762400256 -> viable/strict/1762400256 2025-12-04T08:57:44.2588813Z * [new tag] viable/strict/1762401469 -> viable/strict/1762401469 2025-12-04T08:57:44.2589024Z * [new tag] viable/strict/1762408195 -> viable/strict/1762408195 2025-12-04T08:57:44.2589237Z * [new tag] viable/strict/1762410411 -> viable/strict/1762410411 2025-12-04T08:57:44.2589451Z * [new tag] viable/strict/1762417613 -> viable/strict/1762417613 2025-12-04T08:57:44.2589659Z * [new tag] viable/strict/1762419198 -> viable/strict/1762419198 2025-12-04T08:57:44.2589880Z * [new tag] viable/strict/1762422656 -> viable/strict/1762422656 2025-12-04T08:57:44.2590085Z * [new tag] viable/strict/1762424746 -> viable/strict/1762424746 2025-12-04T08:57:44.2590399Z * [new tag] viable/strict/1762446386 -> viable/strict/1762446386 2025-12-04T08:57:44.2590614Z * [new tag] viable/strict/1762449912 -> viable/strict/1762449912 2025-12-04T08:57:44.2591723Z * [new tag] viable/strict/1762457031 -> viable/strict/1762457031 2025-12-04T08:57:44.2592479Z * [new tag] viable/strict/1762462441 -> viable/strict/1762462441 2025-12-04T08:57:44.2593504Z * [new tag] viable/strict/1762467909 -> viable/strict/1762467909 2025-12-04T08:57:44.2594536Z * [new tag] viable/strict/1762471493 -> viable/strict/1762471493 2025-12-04T08:57:44.2595548Z * [new tag] viable/strict/1762475990 -> viable/strict/1762475990 2025-12-04T08:57:44.2596642Z * [new tag] viable/strict/1762477933 -> viable/strict/1762477933 2025-12-04T08:57:44.2597636Z * [new tag] viable/strict/1762491053 -> viable/strict/1762491053 2025-12-04T08:57:44.2598603Z * [new tag] viable/strict/1762493118 -> viable/strict/1762493118 2025-12-04T08:57:44.2599570Z * [new tag] viable/strict/1762498442 -> viable/strict/1762498442 2025-12-04T08:57:44.2600739Z * [new tag] viable/strict/1762501778 -> viable/strict/1762501778 2025-12-04T08:57:44.2601527Z * [new tag] viable/strict/1762504001 -> viable/strict/1762504001 2025-12-04T08:57:44.2602713Z * [new tag] viable/strict/1762505583 -> viable/strict/1762505583 2025-12-04T08:57:44.2603849Z * [new tag] viable/strict/1762507523 -> viable/strict/1762507523 2025-12-04T08:57:44.2604898Z * [new tag] viable/strict/1762511140 -> viable/strict/1762511140 2025-12-04T08:57:44.2605999Z * [new tag] viable/strict/1762512632 -> viable/strict/1762512632 2025-12-04T08:57:44.2607028Z * [new tag] viable/strict/1762520467 -> viable/strict/1762520467 2025-12-04T08:57:44.2608044Z * [new tag] viable/strict/1762522016 -> viable/strict/1762522016 2025-12-04T08:57:44.2609022Z * [new tag] viable/strict/1762530591 -> viable/strict/1762530591 2025-12-04T08:57:44.2610008Z * [new tag] viable/strict/1762543405 -> viable/strict/1762543405 2025-12-04T08:57:44.2610776Z * [new tag] viable/strict/1762544998 -> viable/strict/1762544998 2025-12-04T08:57:44.2611820Z * [new tag] viable/strict/1762552182 -> viable/strict/1762552182 2025-12-04T08:57:44.2612770Z * [new tag] viable/strict/1762554297 -> viable/strict/1762554297 2025-12-04T08:57:44.2613946Z * [new tag] viable/strict/1762559381 -> viable/strict/1762559381 2025-12-04T08:57:44.2615003Z * [new tag] viable/strict/1762562222 -> viable/strict/1762562222 2025-12-04T08:57:44.2616055Z * [new tag] viable/strict/1762564319 -> viable/strict/1762564319 2025-12-04T08:57:44.2616859Z * [new tag] viable/strict/1762566904 -> viable/strict/1762566904 2025-12-04T08:57:44.2617900Z * [new tag] viable/strict/1762569781 -> viable/strict/1762569781 2025-12-04T08:57:44.2619283Z * [new tag] viable/strict/1762575940 -> viable/strict/1762575940 2025-12-04T08:57:44.2620301Z * [new tag] viable/strict/1762580974 -> viable/strict/1762580974 2025-12-04T08:57:44.2621303Z * [new tag] viable/strict/1762583185 -> viable/strict/1762583185 2025-12-04T08:57:44.2622301Z * [new tag] viable/strict/1762586647 -> viable/strict/1762586647 2025-12-04T08:57:44.2623324Z * [new tag] viable/strict/1762588183 -> viable/strict/1762588183 2025-12-04T08:57:44.2624331Z * [new tag] viable/strict/1762593886 -> viable/strict/1762593886 2025-12-04T08:57:44.2625463Z * [new tag] viable/strict/1762650743 -> viable/strict/1762650743 2025-12-04T08:57:44.2626536Z * [new tag] viable/strict/1762653328 -> viable/strict/1762653328 2025-12-04T08:57:44.2627552Z * [new tag] viable/strict/1762659342 -> viable/strict/1762659342 2025-12-04T08:57:44.2628542Z * [new tag] viable/strict/1762662360 -> viable/strict/1762662360 2025-12-04T08:57:44.2629485Z * [new tag] viable/strict/1762667377 -> viable/strict/1762667377 2025-12-04T08:57:44.2630451Z * [new tag] viable/strict/1762671090 -> viable/strict/1762671090 2025-12-04T08:57:44.2631442Z * [new tag] viable/strict/1762680284 -> viable/strict/1762680284 2025-12-04T08:57:44.2632631Z * [new tag] viable/strict/1762683900 -> viable/strict/1762683900 2025-12-04T08:57:44.2633634Z * [new tag] viable/strict/1762705541 -> viable/strict/1762705541 2025-12-04T08:57:44.2634598Z * [new tag] viable/strict/1762709004 -> viable/strict/1762709004 2025-12-04T08:57:44.2635601Z * [new tag] viable/strict/1762746004 -> viable/strict/1762746004 2025-12-04T08:57:44.2637210Z * [new tag] viable/strict/1762748799 -> viable/strict/1762748799 2025-12-04T08:57:44.2639045Z * [new tag] viable/strict/1762759504 -> viable/strict/1762759504 2025-12-04T08:57:44.2639317Z * [new tag] viable/strict/1762760973 -> viable/strict/1762760973 2025-12-04T08:57:44.2639856Z * [new tag] viable/strict/1762775374 -> viable/strict/1762775374 2025-12-04T08:57:44.2640915Z * [new tag] viable/strict/1762777661 -> viable/strict/1762777661 2025-12-04T08:57:44.2641881Z * [new tag] viable/strict/1762779774 -> viable/strict/1762779774 2025-12-04T08:57:44.2643000Z * [new tag] viable/strict/1762781259 -> viable/strict/1762781259 2025-12-04T08:57:44.2644087Z * [new tag] viable/strict/1762793628 -> viable/strict/1762793628 2025-12-04T08:57:44.2645107Z * [new tag] viable/strict/1762800711 -> viable/strict/1762800711 2025-12-04T08:57:44.2646101Z * [new tag] viable/strict/1762809894 -> viable/strict/1762809894 2025-12-04T08:57:44.2647062Z * [new tag] viable/strict/1762811384 -> viable/strict/1762811384 2025-12-04T08:57:44.2648163Z * [new tag] viable/strict/1762813841 -> viable/strict/1762813841 2025-12-04T08:57:44.2649118Z * [new tag] viable/strict/1762815047 -> viable/strict/1762815047 2025-12-04T08:57:44.2650248Z * [new tag] viable/strict/1762817094 -> viable/strict/1762817094 2025-12-04T08:57:44.2651272Z * [new tag] viable/strict/1762818582 -> viable/strict/1762818582 2025-12-04T08:57:44.2652333Z * [new tag] viable/strict/1762821623 -> viable/strict/1762821623 2025-12-04T08:57:44.2653177Z * [new tag] viable/strict/1762823531 -> viable/strict/1762823531 2025-12-04T08:57:44.2654627Z * [new tag] viable/strict/1762849583 -> viable/strict/1762849583 2025-12-04T08:57:44.2655603Z * [new tag] viable/strict/1762851200 -> viable/strict/1762851200 2025-12-04T08:57:44.2656630Z * [new tag] viable/strict/1762854603 -> viable/strict/1762854603 2025-12-04T08:57:44.2657703Z * [new tag] viable/strict/1762858276 -> viable/strict/1762858276 2025-12-04T08:57:44.2658775Z * [new tag] viable/strict/1762860891 -> viable/strict/1762860891 2025-12-04T08:57:44.2660406Z * [new tag] viable/strict/1762866174 -> viable/strict/1762866174 2025-12-04T08:57:44.2661463Z * [new tag] viable/strict/1762867653 -> viable/strict/1762867653 2025-12-04T08:57:44.2662443Z * [new tag] viable/strict/1762872669 -> viable/strict/1762872669 2025-12-04T08:57:44.2663240Z * [new tag] viable/strict/1762878380 -> viable/strict/1762878380 2025-12-04T08:57:44.2664331Z * [new tag] viable/strict/1762889003 -> viable/strict/1762889003 2025-12-04T08:57:44.2665465Z * [new tag] viable/strict/1762890589 -> viable/strict/1762890589 2025-12-04T08:57:44.2666503Z * [new tag] viable/strict/1762892743 -> viable/strict/1762892743 2025-12-04T08:57:44.2667488Z * [new tag] viable/strict/1762894271 -> viable/strict/1762894271 2025-12-04T08:57:44.2668301Z * [new tag] viable/strict/1762896287 -> viable/strict/1762896287 2025-12-04T08:57:44.2669316Z * [new tag] viable/strict/1762915871 -> viable/strict/1762915871 2025-12-04T08:57:44.2670428Z * [new tag] viable/strict/1762918569 -> viable/strict/1762918569 2025-12-04T08:57:44.2671214Z * [new tag] viable/strict/1762919776 -> viable/strict/1762919776 2025-12-04T08:57:44.2672268Z * [new tag] viable/strict/1762923072 -> viable/strict/1762923072 2025-12-04T08:57:44.2673273Z * [new tag] viable/strict/1762928826 -> viable/strict/1762928826 2025-12-04T08:57:44.2674350Z * [new tag] viable/strict/1762930451 -> viable/strict/1762930451 2025-12-04T08:57:44.2675383Z * [new tag] viable/strict/1762933780 -> viable/strict/1762933780 2025-12-04T08:57:44.2676378Z * [new tag] viable/strict/1762937638 -> viable/strict/1762937638 2025-12-04T08:57:44.2677542Z * [new tag] viable/strict/1762939545 -> viable/strict/1762939545 2025-12-04T08:57:44.2678541Z * [new tag] viable/strict/1762962692 -> viable/strict/1762962692 2025-12-04T08:57:44.2680023Z * [new tag] viable/strict/1762979143 -> viable/strict/1762979143 2025-12-04T08:57:44.2681008Z * [new tag] viable/strict/1762984188 -> viable/strict/1762984188 2025-12-04T08:57:44.2681795Z * [new tag] viable/strict/1762986306 -> viable/strict/1762986306 2025-12-04T08:57:44.2682886Z * [new tag] viable/strict/1762989903 -> viable/strict/1762989903 2025-12-04T08:57:44.2683908Z * [new tag] viable/strict/1762991377 -> viable/strict/1762991377 2025-12-04T08:57:44.2684931Z * [new tag] viable/strict/1762998921 -> viable/strict/1762998921 2025-12-04T08:57:44.2686532Z * [new tag] viable/strict/1763002287 -> viable/strict/1763002287 2025-12-04T08:57:44.2687566Z * [new tag] viable/strict/1763016840 -> viable/strict/1763016840 2025-12-04T08:57:44.2688611Z * [new tag] viable/strict/1763020180 -> viable/strict/1763020180 2025-12-04T08:57:44.2689686Z * [new tag] viable/strict/1763027421 -> viable/strict/1763027421 2025-12-04T08:57:44.2690696Z * [new tag] viable/strict/1763031120 -> viable/strict/1763031120 2025-12-04T08:57:44.2691883Z * [new tag] viable/strict/1763036861 -> viable/strict/1763036861 2025-12-04T08:57:44.2692883Z * [new tag] viable/strict/1763038993 -> viable/strict/1763038993 2025-12-04T08:57:44.2694356Z * [new tag] viable/strict/1763054703 -> viable/strict/1763054703 2025-12-04T08:57:44.2695160Z * [new tag] viable/strict/1763067061 -> viable/strict/1763067061 2025-12-04T08:57:44.2696247Z * [new tag] viable/strict/1763070847 -> viable/strict/1763070847 2025-12-04T08:57:44.2697283Z * [new tag] viable/strict/1763072706 -> viable/strict/1763072706 2025-12-04T08:57:44.2698388Z * [new tag] viable/strict/1763076302 -> viable/strict/1763076302 2025-12-04T08:57:44.2699399Z * [new tag] viable/strict/1763080816 -> viable/strict/1763080816 2025-12-04T08:57:44.2700426Z * [new tag] viable/strict/1763082732 -> viable/strict/1763082732 2025-12-04T08:57:44.2701419Z * [new tag] viable/strict/1763085329 -> viable/strict/1763085329 2025-12-04T08:57:44.2702430Z * [new tag] viable/strict/1763088623 -> viable/strict/1763088623 2025-12-04T08:57:44.2703633Z * [new tag] viable/strict/1763091402 -> viable/strict/1763091402 2025-12-04T08:57:44.2704741Z * [new tag] viable/strict/1763092602 -> viable/strict/1763092602 2025-12-04T08:57:44.2705885Z * [new tag] viable/strict/1763094355 -> viable/strict/1763094355 2025-12-04T08:57:44.2706920Z * [new tag] viable/strict/1763099390 -> viable/strict/1763099390 2025-12-04T08:57:44.2707925Z * [new tag] viable/strict/1763101608 -> viable/strict/1763101608 2025-12-04T08:57:44.2708952Z * [new tag] viable/strict/1763105102 -> viable/strict/1763105102 2025-12-04T08:57:44.2709984Z * [new tag] viable/strict/1763112347 -> viable/strict/1763112347 2025-12-04T08:57:44.2710980Z * [new tag] viable/strict/1763119471 -> viable/strict/1763119471 2025-12-04T08:57:44.2711754Z * [new tag] viable/strict/1763126835 -> viable/strict/1763126835 2025-12-04T08:57:44.2712679Z * [new tag] viable/strict/1763149779 -> viable/strict/1763149779 2025-12-04T08:57:44.2713769Z * [new tag] viable/strict/1763164178 -> viable/strict/1763164178 2025-12-04T08:57:44.2714618Z * [new tag] viable/strict/1763167104 -> viable/strict/1763167104 2025-12-04T08:57:44.2715596Z * [new tag] viable/strict/1763169132 -> viable/strict/1763169132 2025-12-04T08:57:44.2716560Z * [new tag] viable/strict/1763171708 -> viable/strict/1763171708 2025-12-04T08:57:44.2717515Z * [new tag] viable/strict/1763174759 -> viable/strict/1763174759 2025-12-04T08:57:44.2718595Z * [new tag] viable/strict/1763180744 -> viable/strict/1763180744 2025-12-04T08:57:44.2719560Z * [new tag] viable/strict/1763182227 -> viable/strict/1763182227 2025-12-04T08:57:44.2720572Z * [new tag] viable/strict/1763184309 -> viable/strict/1763184309 2025-12-04T08:57:44.2722054Z * [new tag] viable/strict/1763187991 -> viable/strict/1763187991 2025-12-04T08:57:44.2723073Z * [new tag] viable/strict/1763191445 -> viable/strict/1763191445 2025-12-04T08:57:44.2724263Z * [new tag] viable/strict/1763195152 -> viable/strict/1763195152 2025-12-04T08:57:44.2725066Z * [new tag] viable/strict/1763205769 -> viable/strict/1763205769 2025-12-04T08:57:44.2726127Z * [new tag] viable/strict/1763246990 -> viable/strict/1763246990 2025-12-04T08:57:44.2727203Z * [new tag] viable/strict/1763261578 -> viable/strict/1763261578 2025-12-04T08:57:44.2728033Z * [new tag] viable/strict/1763286573 -> viable/strict/1763286573 2025-12-04T08:57:44.2728989Z * [new tag] viable/strict/1763292167 -> viable/strict/1763292167 2025-12-04T08:57:44.2729966Z * [new tag] viable/strict/1763333386 -> viable/strict/1763333386 2025-12-04T08:57:44.2730946Z * [new tag] viable/strict/1763340082 -> viable/strict/1763340082 2025-12-04T08:57:44.2732701Z * [new tag] viable/strict/1763364324 -> viable/strict/1763364324 2025-12-04T08:57:44.2734040Z * [new tag] viable/strict/1763371569 -> viable/strict/1763371569 2025-12-04T08:57:44.2735062Z * [new tag] viable/strict/1763373067 -> viable/strict/1763373067 2025-12-04T08:57:44.2736053Z * [new tag] viable/strict/1763375157 -> viable/strict/1763375157 2025-12-04T08:57:44.2737055Z * [new tag] viable/strict/1763382462 -> viable/strict/1763382462 2025-12-04T08:57:44.2738245Z * [new tag] viable/strict/1763394661 -> viable/strict/1763394661 2025-12-04T08:57:44.2739742Z * [new tag] viable/strict/1763396797 -> viable/strict/1763396797 2025-12-04T08:57:44.2740825Z * [new tag] viable/strict/1763398542 -> viable/strict/1763398542 2025-12-04T08:57:44.2741937Z * [new tag] viable/strict/1763401807 -> viable/strict/1763401807 2025-12-04T08:57:44.2742742Z * [new tag] viable/strict/1763414698 -> viable/strict/1763414698 2025-12-04T08:57:44.2743846Z * [new tag] viable/strict/1763419807 -> viable/strict/1763419807 2025-12-04T08:57:44.2744882Z * [new tag] viable/strict/1763426369 -> viable/strict/1763426369 2025-12-04T08:57:44.2746035Z * [new tag] viable/strict/1763428331 -> viable/strict/1763428331 2025-12-04T08:57:44.2747128Z * [new tag] viable/strict/1763430922 -> viable/strict/1763430922 2025-12-04T08:57:44.2747919Z * [new tag] viable/strict/1763434184 -> viable/strict/1763434184 2025-12-04T08:57:44.2748963Z * [new tag] viable/strict/1763439973 -> viable/strict/1763439973 2025-12-04T08:57:44.2749971Z * [new tag] viable/strict/1763444995 -> viable/strict/1763444995 2025-12-04T08:57:44.2751029Z * [new tag] viable/strict/1763447206 -> viable/strict/1763447206 2025-12-04T08:57:44.2751995Z * [new tag] viable/strict/1763448826 -> viable/strict/1763448826 2025-12-04T08:57:44.2752995Z * [new tag] viable/strict/1763450717 -> viable/strict/1763450717 2025-12-04T08:57:44.2754441Z * [new tag] viable/strict/1763452183 -> viable/strict/1763452183 2025-12-04T08:57:44.2755600Z * [new tag] viable/strict/1763457945 -> viable/strict/1763457945 2025-12-04T08:57:44.2756594Z * [new tag] viable/strict/1763459439 -> viable/strict/1763459439 2025-12-04T08:57:44.2757385Z * [new tag] viable/strict/1763461556 -> viable/strict/1763461556 2025-12-04T08:57:44.2758435Z * [new tag] viable/strict/1763463103 -> viable/strict/1763463103 2025-12-04T08:57:44.2759483Z * [new tag] viable/strict/1763465100 -> viable/strict/1763465100 2025-12-04T08:57:44.2760277Z * [new tag] viable/strict/1763468866 -> viable/strict/1763468866 2025-12-04T08:57:44.2761180Z * [new tag] viable/strict/1763493823 -> viable/strict/1763493823 2025-12-04T08:57:44.2761976Z * [new tag] viable/strict/1763496249 -> viable/strict/1763496249 2025-12-04T08:57:44.2763032Z * [new tag] viable/strict/1763502620 -> viable/strict/1763502620 2025-12-04T08:57:44.2764053Z * [new tag] viable/strict/1763504715 -> viable/strict/1763504715 2025-12-04T08:57:44.2765046Z * [new tag] viable/strict/1763506208 -> viable/strict/1763506208 2025-12-04T08:57:44.2766023Z * [new tag] viable/strict/1763520590 -> viable/strict/1763520590 2025-12-04T08:57:44.2767118Z * [new tag] viable/strict/1763523357 -> viable/strict/1763523357 2025-12-04T08:57:44.2768158Z * [new tag] viable/strict/1763529922 -> viable/strict/1763529922 2025-12-04T08:57:44.2769210Z * [new tag] viable/strict/1763531408 -> viable/strict/1763531408 2025-12-04T08:57:44.2770192Z * [new tag] viable/strict/1763533622 -> viable/strict/1763533622 2025-12-04T08:57:44.2771292Z * [new tag] viable/strict/1763538576 -> viable/strict/1763538576 2025-12-04T08:57:44.2772372Z * [new tag] viable/strict/1763545823 -> viable/strict/1763545823 2025-12-04T08:57:44.2773229Z * [new tag] viable/strict/1763547951 -> viable/strict/1763547951 2025-12-04T08:57:44.2774641Z * [new tag] viable/strict/1763551477 -> viable/strict/1763551477 2025-12-04T08:57:44.2775673Z * [new tag] viable/strict/1763552982 -> viable/strict/1763552982 2025-12-04T08:57:44.2776718Z * [new tag] viable/strict/1763594698 -> viable/strict/1763594698 2025-12-04T08:57:44.2777769Z * [new tag] viable/strict/1763596178 -> viable/strict/1763596178 2025-12-04T08:57:44.2778967Z * [new tag] viable/strict/1763599155 -> viable/strict/1763599155 2025-12-04T08:57:44.2780072Z * [new tag] viable/strict/1763603717 -> viable/strict/1763603717 2025-12-04T08:57:44.2781237Z * [new tag] viable/strict/1763606923 -> viable/strict/1763606923 2025-12-04T08:57:44.2782160Z * [new tag] viable/strict/1763609715 -> viable/strict/1763609715 2025-12-04T08:57:44.2783168Z * [new tag] viable/strict/1763612757 -> viable/strict/1763612757 2025-12-04T08:57:44.2784191Z * [new tag] viable/strict/1763616325 -> viable/strict/1763616325 2025-12-04T08:57:44.2785226Z * [new tag] viable/strict/1763623509 -> viable/strict/1763623509 2025-12-04T08:57:44.2786372Z * [new tag] viable/strict/1763624984 -> viable/strict/1763624984 2025-12-04T08:57:44.2787378Z * [new tag] viable/strict/1763628796 -> viable/strict/1763628796 2025-12-04T08:57:44.2788486Z * [new tag] viable/strict/1763634343 -> viable/strict/1763634343 2025-12-04T08:57:44.2789300Z * [new tag] viable/strict/1763635867 -> viable/strict/1763635867 2025-12-04T08:57:44.2790567Z * [new tag] viable/strict/1763639382 -> viable/strict/1763639382 2025-12-04T08:57:44.2791628Z * [new tag] viable/strict/1763646626 -> viable/strict/1763646626 2025-12-04T08:57:44.2792788Z * [new tag] viable/strict/1763655997 -> viable/strict/1763655997 2025-12-04T08:57:44.2793798Z * [new tag] viable/strict/1763659444 -> viable/strict/1763659444 2025-12-04T08:57:44.2794737Z * [new tag] viable/strict/1763660992 -> viable/strict/1763660992 2025-12-04T08:57:44.2795682Z * [new tag] viable/strict/1763663201 -> viable/strict/1763663201 2025-12-04T08:57:44.2796736Z * [new tag] viable/strict/1763670362 -> viable/strict/1763670362 2025-12-04T08:57:44.2797561Z * [new tag] viable/strict/1763675378 -> viable/strict/1763675378 2025-12-04T08:57:44.2798572Z * [new tag] viable/strict/1763693343 -> viable/strict/1763693343 2025-12-04T08:57:44.2799535Z * [new tag] viable/strict/1763696088 -> viable/strict/1763696088 2025-12-04T08:57:44.2800635Z * [new tag] viable/strict/1763697343 -> viable/strict/1763697343 2025-12-04T08:57:44.2801614Z * [new tag] viable/strict/1763699165 -> viable/strict/1763699165 2025-12-04T08:57:44.2802619Z * [new tag] viable/strict/1763700660 -> viable/strict/1763700660 2025-12-04T08:57:44.2803607Z * [new tag] viable/strict/1763704209 -> viable/strict/1763704209 2025-12-04T08:57:44.2804700Z * [new tag] viable/strict/1763706411 -> viable/strict/1763706411 2025-12-04T08:57:44.2805648Z * [new tag] viable/strict/1763708082 -> viable/strict/1763708082 2025-12-04T08:57:44.2806596Z * [new tag] viable/strict/1763711381 -> viable/strict/1763711381 2025-12-04T08:57:44.2807505Z * [new tag] viable/strict/1763713593 -> viable/strict/1763713593 2025-12-04T08:57:44.2808509Z * [new tag] viable/strict/1763715201 -> viable/strict/1763715201 2025-12-04T08:57:44.2809476Z * [new tag] viable/strict/1763733017 -> viable/strict/1763733017 2025-12-04T08:57:44.2810448Z * [new tag] viable/strict/1763735108 -> viable/strict/1763735108 2025-12-04T08:57:44.2811435Z * [new tag] viable/strict/1763749579 -> viable/strict/1763749579 2025-12-04T08:57:44.2812384Z * [new tag] viable/strict/1763751113 -> viable/strict/1763751113 2025-12-04T08:57:44.2813684Z * [new tag] viable/strict/1763753035 -> viable/strict/1763753035 2025-12-04T08:57:44.2814853Z * [new tag] viable/strict/1763754578 -> viable/strict/1763754578 2025-12-04T08:57:44.2815903Z * [new tag] viable/strict/1763756748 -> viable/strict/1763756748 2025-12-04T08:57:44.2816890Z * [new tag] viable/strict/1763758205 -> viable/strict/1763758205 2025-12-04T08:57:44.2817683Z * [new tag] viable/strict/1763764050 -> viable/strict/1763764050 2025-12-04T08:57:44.2818723Z * [new tag] viable/strict/1763771887 -> viable/strict/1763771887 2025-12-04T08:57:44.2820335Z * [new tag] viable/strict/1763773920 -> viable/strict/1763773920 2025-12-04T08:57:44.2821324Z * [new tag] viable/strict/1763776501 -> viable/strict/1763776501 2025-12-04T08:57:44.2822317Z * [new tag] viable/strict/1763779437 -> viable/strict/1763779437 2025-12-04T08:57:44.2823585Z * [new tag] viable/strict/1763781038 -> viable/strict/1763781038 2025-12-04T08:57:44.2824385Z * [new tag] viable/strict/1763782245 -> viable/strict/1763782245 2025-12-04T08:57:44.2825641Z * [new tag] viable/strict/1763785568 -> viable/strict/1763785568 2025-12-04T08:57:44.2826658Z * [new tag] viable/strict/1763787006 -> viable/strict/1763787006 2025-12-04T08:57:44.2827717Z * [new tag] viable/strict/1763789103 -> viable/strict/1763789103 2025-12-04T08:57:44.2828650Z * [new tag] viable/strict/1763790578 -> viable/strict/1763790578 2025-12-04T08:57:44.2829629Z * [new tag] viable/strict/1763796275 -> viable/strict/1763796275 2025-12-04T08:57:44.2830883Z * [new tag] viable/strict/1763801465 -> viable/strict/1763801465 2025-12-04T08:57:44.2831858Z * [new tag] viable/strict/1763803522 -> viable/strict/1763803522 2025-12-04T08:57:44.2832772Z * [new tag] viable/strict/1763808581 -> viable/strict/1763808581 2025-12-04T08:57:44.2833733Z * [new tag] viable/strict/1763840977 -> viable/strict/1763840977 2025-12-04T08:57:44.2834709Z * [new tag] viable/strict/1763846659 -> viable/strict/1763846659 2025-12-04T08:57:44.2835694Z * [new tag] viable/strict/1763872065 -> viable/strict/1763872065 2025-12-04T08:57:44.2836745Z * [new tag] viable/strict/1763873648 -> viable/strict/1763873648 2025-12-04T08:57:44.2837807Z * [new tag] viable/strict/1763875506 -> viable/strict/1763875506 2025-12-04T08:57:44.2838590Z * [new tag] viable/strict/1763889904 -> viable/strict/1763889904 2025-12-04T08:57:44.2839620Z * [new tag] viable/strict/1763930999 -> viable/strict/1763930999 2025-12-04T08:57:44.2840638Z * [new tag] viable/strict/1763944964 -> viable/strict/1763944964 2025-12-04T08:57:44.2841444Z * [new tag] viable/strict/1763958474 -> viable/strict/1763958474 2025-12-04T08:57:44.2842600Z * [new tag] viable/strict/1763967263 -> viable/strict/1763967263 2025-12-04T08:57:44.2843611Z * [new tag] viable/strict/1763972803 -> viable/strict/1763972803 2025-12-04T08:57:44.2844549Z * [new tag] viable/strict/1763976376 -> viable/strict/1763976376 2025-12-04T08:57:44.2845553Z * [new tag] viable/strict/1763989404 -> viable/strict/1763989404 2025-12-04T08:57:44.2846457Z * [new tag] viable/strict/1763990887 -> viable/strict/1763990887 2025-12-04T08:57:44.2847438Z * [new tag] viable/strict/1764019919 -> viable/strict/1764019919 2025-12-04T08:57:44.2848484Z * [new tag] viable/strict/1764023134 -> viable/strict/1764023134 2025-12-04T08:57:44.2849261Z * [new tag] viable/strict/1764024593 -> viable/strict/1764024593 2025-12-04T08:57:44.2850316Z * [new tag] viable/strict/1764026706 -> viable/strict/1764026706 2025-12-04T08:57:44.2851556Z * [new tag] viable/strict/1764031139 -> viable/strict/1764031139 2025-12-04T08:57:44.2852539Z * [new tag] viable/strict/1764033131 -> viable/strict/1764033131 2025-12-04T08:57:44.2853742Z * [new tag] viable/strict/1764035725 -> viable/strict/1764035725 2025-12-04T08:57:44.2854538Z * [new tag] viable/strict/1764624265 -> viable/strict/1764624265 2025-12-04T08:57:44.2855381Z * [new tag] viable/strict/1764631514 -> viable/strict/1764631514 2025-12-04T08:57:44.2856197Z * [new tag] viable/strict/1764632987 -> viable/strict/1764632987 2025-12-04T08:57:44.2857181Z * [new tag] viable/strict/1764636063 -> viable/strict/1764636063 2025-12-04T08:57:44.2857933Z * [new tag] viable/strict/1764643975 -> viable/strict/1764643975 2025-12-04T08:57:44.2858775Z * [new tag] viable/strict/1764646859 -> viable/strict/1764646859 2025-12-04T08:57:44.2859618Z * [new tag] viable/strict/1764653120 -> viable/strict/1764653120 2025-12-04T08:57:44.2860536Z * [new tag] viable/strict/1764654632 -> viable/strict/1764654632 2025-12-04T08:57:44.2861318Z * [new tag] viable/strict/1764656821 -> viable/strict/1764656821 2025-12-04T08:57:44.2862109Z * [new tag] viable/strict/1764658557 -> viable/strict/1764658557 2025-12-04T08:57:44.2862954Z * [new tag] viable/strict/1764660333 -> viable/strict/1764660333 2025-12-04T08:57:44.2863780Z * [new tag] viable/strict/1764661812 -> viable/strict/1764661812 2025-12-04T08:57:44.2864603Z * [new tag] viable/strict/1764664023 -> viable/strict/1764664023 2025-12-04T08:57:44.2865551Z * [new tag] viable/strict/1764669150 -> viable/strict/1764669150 2025-12-04T08:57:44.2866352Z * [new tag] viable/strict/1764680709 -> viable/strict/1764680709 2025-12-04T08:57:44.2867334Z * [new tag] viable/strict/1764687619 -> viable/strict/1764687619 2025-12-04T08:57:44.2868085Z * [new tag] viable/strict/1764696355 -> viable/strict/1764696355 2025-12-04T08:57:44.2868918Z * [new tag] viable/strict/1764701767 -> viable/strict/1764701767 2025-12-04T08:57:44.2869723Z * [new tag] viable/strict/1764710768 -> viable/strict/1764710768 2025-12-04T08:57:44.2870524Z * [new tag] viable/strict/1764716202 -> viable/strict/1764716202 2025-12-04T08:57:44.2871351Z * [new tag] viable/strict/1764793566 -> viable/strict/1764793566 2025-12-04T08:57:44.2872145Z * [new tag] viable/strict/1764797093 -> viable/strict/1764797093 2025-12-04T08:57:44.2872969Z * [new tag] viable/strict/1764800729 -> viable/strict/1764800729 2025-12-04T08:57:44.2874020Z * [new tag] whc_flight_1 -> whc_flight_1 2025-12-04T08:57:44.2874966Z * [new tag] whc_flight_2 -> whc_flight_2 2025-12-04T08:57:44.2876150Z * [new tag] whc_flight_4 -> whc_flight_4 2025-12-04T08:57:44.3558389Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T08:57:44.3584582Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:44.3587785Z ##[endgroup] 2025-12-04T08:57:44.3588109Z ##[group]Determining the checkout info 2025-12-04T08:57:44.3589253Z ##[endgroup] 2025-12-04T08:57:44.3593715Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T08:57:44.3625974Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T08:57:44.3650850Z ##[group]Checking out the ref 2025-12-04T08:57:44.3653436Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:45.4152846Z Updating files: 80% (16121/20121) 2025-12-04T08:57:45.4447146Z Updating files: 81% (16299/20121) 2025-12-04T08:57:45.4659174Z Updating files: 82% (16500/20121) 2025-12-04T08:57:45.4812643Z Updating files: 83% (16701/20121) 2025-12-04T08:57:45.4950672Z Updating files: 84% (16902/20121) 2025-12-04T08:57:45.5114865Z Updating files: 85% (17103/20121) 2025-12-04T08:57:45.5271287Z Updating files: 86% (17305/20121) 2025-12-04T08:57:45.5411304Z Updating files: 87% (17506/20121) 2025-12-04T08:57:45.5523823Z Updating files: 88% (17707/20121) 2025-12-04T08:57:45.5656849Z Updating files: 89% (17908/20121) 2025-12-04T08:57:45.5835419Z Updating files: 90% (18109/20121) 2025-12-04T08:57:45.5950918Z Updating files: 91% (18311/20121) 2025-12-04T08:57:45.6105162Z Updating files: 92% (18512/20121) 2025-12-04T08:57:45.6288289Z Updating files: 93% (18713/20121) 2025-12-04T08:57:45.6495451Z Updating files: 94% (18914/20121) 2025-12-04T08:57:45.6671121Z Updating files: 95% (19115/20121) 2025-12-04T08:57:45.6829026Z Updating files: 96% (19317/20121) 2025-12-04T08:57:45.6997758Z Updating files: 97% (19518/20121) 2025-12-04T08:57:45.7290676Z Updating files: 98% (19719/20121) 2025-12-04T08:57:45.7468832Z Updating files: 99% (19920/20121) 2025-12-04T08:57:45.7469244Z Updating files: 100% (20121/20121) 2025-12-04T08:57:45.7469875Z Updating files: 100% (20121/20121), done. 2025-12-04T08:57:45.7756732Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'. 2025-12-04T08:57:45.7757378Z 2025-12-04T08:57:45.7757820Z You are in 'detached HEAD' state. You can look around, make experimental 2025-12-04T08:57:45.7758534Z changes and commit them, and you can discard any commits you make in this 2025-12-04T08:57:45.7759152Z state without impacting any branches by switching back to a branch. 2025-12-04T08:57:45.7759529Z 2025-12-04T08:57:45.7759782Z If you want to create a new branch to retain commits you create, you may 2025-12-04T08:57:45.7760363Z do so (now or later) by using -c with the switch command. Example: 2025-12-04T08:57:45.7760688Z 2025-12-04T08:57:45.7760830Z git switch -c 2025-12-04T08:57:45.7761053Z 2025-12-04T08:57:45.7761175Z Or undo this operation with: 2025-12-04T08:57:45.7761388Z 2025-12-04T08:57:45.7761486Z git switch - 2025-12-04T08:57:45.7761645Z 2025-12-04T08:57:45.7761938Z Turn off this advice by setting config variable advice.detachedHead to false 2025-12-04T08:57:45.7762343Z 2025-12-04T08:57:45.7762657Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:57:45.7843574Z ##[endgroup] 2025-12-04T08:57:45.7849090Z ##[group]Setting up auth for fetching submodules 2025-12-04T08:57:45.7849794Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:57:45.7898192Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T08:57:45.7924883Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T08:57:45.7953269Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T08:57:45.7977280Z ##[endgroup] 2025-12-04T08:57:45.7977785Z ##[group]Fetching submodules 2025-12-04T08:57:45.7981446Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T08:57:45.8307376Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T08:57:45.8623169Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-12-04T08:57:45.8624826Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-12-04T08:57:45.8627637Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-12-04T08:57:45.8630226Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-12-04T08:57:45.8632983Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-12-04T08:57:45.8636274Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-12-04T08:57:45.8638877Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-12-04T08:57:45.8641894Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-12-04T08:57:45.8645202Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-12-04T08:57:45.8649311Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-12-04T08:57:45.8652799Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-12-04T08:57:45.8656773Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-12-04T08:57:45.8660833Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-12-04T08:57:45.8664396Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-12-04T08:57:45.8668406Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-12-04T08:57:45.8674019Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-12-04T08:57:45.8680281Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-12-04T08:57:45.8684584Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-12-04T08:57:45.8689204Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:57:45.8693949Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-12-04T08:57:45.8698805Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-12-04T08:57:45.8703458Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-12-04T08:57:45.8708378Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-12-04T08:57:45.8713203Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-12-04T08:57:45.8718198Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-12-04T08:57:45.8723249Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-12-04T08:57:45.8728373Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-12-04T08:57:45.8733788Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-12-04T08:57:45.8739781Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-12-04T08:57:45.8745143Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-12-04T08:57:45.8750929Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-12-04T08:57:45.8756580Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-12-04T08:57:45.8762506Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-12-04T08:57:45.8769974Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-12-04T08:57:45.8776602Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-12-04T08:57:45.8783152Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-12-04T08:57:45.8789676Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-12-04T08:57:45.8820304Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2025-12-04T08:57:46.1148826Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2025-12-04T08:57:46.1149691Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2025-12-04T08:57:46.1150711Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2025-12-04T08:57:46.1151889Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-12-04T08:57:46.1152870Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2025-12-04T08:57:46.1153709Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2025-12-04T08:57:46.2108150Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'... 2025-12-04T08:57:46.5420844Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2025-12-04T08:57:46.5421882Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-12-04T08:57:46.5422997Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2025-12-04T08:57:46.5424782Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-12-04T08:57:46.5426507Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2025-12-04T08:57:46.5428160Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2025-12-04T08:57:46.5429772Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-12-04T08:57:46.5431570Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'... 2025-12-04T08:57:46.5432617Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'... 2025-12-04T08:57:46.5433608Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-12-04T08:57:46.5876131Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-12-04T08:57:47.7426039Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-12-04T08:57:47.7426953Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2025-12-04T08:57:47.7427796Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2025-12-04T08:57:47.7428614Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'... 2025-12-04T08:57:47.7429449Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2025-12-04T08:57:47.7430273Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2025-12-04T08:57:47.7431119Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-12-04T08:57:47.7431967Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2025-12-04T08:57:47.7614123Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-12-04T08:57:58.9681923Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-12-04T08:57:58.9682885Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2025-12-04T08:57:58.9683722Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2025-12-04T08:57:58.9684542Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2025-12-04T08:57:58.9685626Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-12-04T08:57:58.9686468Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2025-12-04T08:57:58.9687376Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-12-04T08:57:58.9688180Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2025-12-04T08:57:59.0682969Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'... 2025-12-04T08:58:02.1752255Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T08:58:02.1878077Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T08:58:02.1979524Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T08:58:02.2244535Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T08:58:02.3126476Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T08:58:02.3720907Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T08:58:03.1410972Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T08:58:03.3400787Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T08:58:03.3421487Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:03.3450417Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-12-04T08:58:08.1245604Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T08:58:08.1497212Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T08:58:08.5204832Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:58:08.5737441Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T08:58:08.6761581Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T08:58:08.7259530Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T08:58:09.4116077Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T08:58:09.5736389Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T08:58:09.5758886Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:09.5760248Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:09.5762123Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:09.5764835Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:09.5767733Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:09.5770671Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:09.5773636Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-12-04T08:58:09.5802970Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-12-04T08:58:10.6961732Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-12-04T08:58:10.6962788Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-12-04T08:58:10.6963815Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-12-04T08:58:10.7964406Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-12-04T08:58:13.9656081Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-12-04T08:58:14.0656599Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-12-04T08:58:16.4919993Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T08:58:16.8614828Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:58:16.9672942Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T08:58:17.6350845Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T08:58:17.6844650Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:17.6967866Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T08:58:17.8020417Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T08:58:17.8743273Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T08:58:17.8763486Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:17.8764845Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:17.8794423Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-12-04T08:58:22.2869824Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-12-04T08:58:22.5397391Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T08:58:23.1309765Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T08:58:23.2754305Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T08:58:23.3064081Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T08:58:23.3481169Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T08:58:23.3752401Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T08:58:23.4217118Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:23.4352745Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T08:58:23.4370828Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:23.4398719Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-12-04T08:58:39.4717381Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T08:58:39.4933659Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T08:58:39.5975315Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T08:58:39.5994903Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:39.5997814Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:39.6001233Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:39.6028443Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-12-04T08:58:40.2211145Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-12-04T08:58:40.7039529Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-12-04T08:58:40.7999721Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T08:58:40.8017007Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:40.8019049Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:40.8021890Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:40.8024552Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:40.8027300Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:40.8030319Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:40.8033275Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:40.8036544Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:40.8040031Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:40.8069964Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-12-04T08:58:43.1614884Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-12-04T08:58:43.1616306Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-12-04T08:58:43.1617720Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'... 2025-12-04T08:58:43.1619131Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-12-04T08:58:43.1620501Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-12-04T08:58:43.1621866Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-12-04T08:58:43.1623264Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-12-04T08:58:43.2616223Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-12-04T08:58:47.8432840Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T08:58:47.8621693Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T08:58:47.9012534Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T08:58:47.9160556Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T08:58:47.9176751Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:47.9205602Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-12-04T08:58:48.1971668Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T08:58:48.2166620Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T08:58:48.2644956Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:48.3683981Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T08:58:48.3861521Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T08:58:48.4038622Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T08:58:48.4056024Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:48.4058379Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:48.4086986Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T08:58:50.4092132Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T08:58:50.6748802Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T08:58:50.7245902Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:58:50.7586951Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T08:58:50.8055268Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:50.8599528Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T08:58:50.9002938Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T08:58:51.0049375Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T08:58:51.4162416Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T08:58:51.4202188Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:51.4231981Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-12-04T08:58:52.2039325Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T08:58:52.2768131Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T08:58:52.2787651Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:52.2790364Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:52.2793120Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:52.2795810Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:52.2798731Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:52.2801428Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:52.2804293Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:52.2807091Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:52.2834992Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-12-04T08:58:52.6559857Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-12-04T08:58:52.6561195Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-12-04T08:58:52.6562408Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-12-04T08:58:52.6563580Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-12-04T08:58:52.7560895Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-12-04T08:58:53.2157629Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-12-04T08:58:59.3637794Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-12-04T08:59:00.0999741Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T08:59:00.1430849Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T08:59:00.1608397Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T08:59:00.2686675Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T08:59:00.2834757Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T08:59:00.2986616Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T08:59:00.3150532Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T08:59:00.3167251Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:00.3169116Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:00.3198052Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T08:59:02.3276580Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T08:59:02.5924763Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T08:59:02.6415401Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:59:03.1257621Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T08:59:03.1380823Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T08:59:03.4217010Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T08:59:03.4241376Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:03.4242938Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:03.4273185Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-12-04T08:59:03.9411062Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-12-04T08:59:04.2809846Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T08:59:04.3571677Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T08:59:04.3670417Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T08:59:04.3800656Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T08:59:04.4237587Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T08:59:04.4533334Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T08:59:04.4975987Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T08:59:04.5256409Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T08:59:04.5275823Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:04.5277145Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:04.5279691Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:04.5283036Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:04.5311984Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-12-04T08:59:05.4269528Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-12-04T08:59:05.4270650Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-12-04T08:59:05.4631898Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-12-04T08:59:05.5233650Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T08:59:05.5396878Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T08:59:05.6163839Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T08:59:05.6459807Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T08:59:05.6477725Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:05.6504063Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-12-04T08:59:05.8093786Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T08:59:05.8134133Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T08:59:05.8453321Z Entering 'android/libs/fbjni' 2025-12-04T08:59:05.8496586Z Entering 'third_party/FP16' 2025-12-04T08:59:05.8539771Z Entering 'third_party/FXdiv' 2025-12-04T08:59:05.8582966Z Entering 'third_party/NNPACK' 2025-12-04T08:59:05.8628144Z Entering 'third_party/NVTX' 2025-12-04T08:59:05.8672533Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:05.8716917Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:05.8776008Z Entering 'third_party/aiter' 2025-12-04T08:59:05.8821258Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:05.8875891Z Entering 'third_party/benchmark' 2025-12-04T08:59:05.8919753Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:05.8974384Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:05.9016438Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:05.9059944Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:05.9103719Z Entering 'third_party/cutlass' 2025-12-04T08:59:05.9157394Z Entering 'third_party/fbgemm' 2025-12-04T08:59:05.9205142Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:05.9247258Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:05.9309672Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:05.9353330Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:05.9404409Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:05.9447357Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:05.9489898Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:05.9535147Z Entering 'third_party/flash-attention' 2025-12-04T08:59:05.9578888Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:05.9635822Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:05.9688762Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:05.9736040Z Entering 'third_party/fmt' 2025-12-04T08:59:05.9778324Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:05.9821781Z Entering 'third_party/gloo' 2025-12-04T08:59:05.9866716Z Entering 'third_party/googletest' 2025-12-04T08:59:05.9912351Z Entering 'third_party/ideep' 2025-12-04T08:59:05.9954659Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:06.0006605Z Entering 'third_party/ittapi' 2025-12-04T08:59:06.0050751Z Entering 'third_party/kineto' 2025-12-04T08:59:06.0094981Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:06.0135958Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:06.0179743Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:06.0226913Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:06.0270707Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:06.0313124Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:06.0357162Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:06.0399841Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:06.0444312Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:06.0494854Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:06.0537066Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:06.0580649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.0625275Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.0674288Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:06.0718195Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:06.0763818Z Entering 'third_party/kleidiai' 2025-12-04T08:59:06.0807666Z Entering 'third_party/mimalloc' 2025-12-04T08:59:06.0852617Z Entering 'third_party/nlohmann' 2025-12-04T08:59:06.0898147Z Entering 'third_party/onnx' 2025-12-04T08:59:06.0962355Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:06.1008282Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:06.1053976Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:06.1095913Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:06.1137217Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:06.1179132Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:06.1227097Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:06.1273165Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:06.1315764Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:06.1357231Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.1402234Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.1447574Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:06.1522155Z Entering 'third_party/pocketfft' 2025-12-04T08:59:06.1566954Z Entering 'third_party/protobuf' 2025-12-04T08:59:06.1615100Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:06.1655160Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:06.1699193Z Entering 'third_party/psimd' 2025-12-04T08:59:06.1746466Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:06.1792483Z Entering 'third_party/pybind11' 2025-12-04T08:59:06.1836045Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:06.1879429Z Entering 'third_party/sleef' 2025-12-04T08:59:06.1924989Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:06.1968857Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:06.2017664Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:06.2059688Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:06.2102404Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:06.2144865Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:06.2202218Z ##[endgroup] 2025-12-04T08:59:06.2202909Z ##[group]Persisting credentials for submodules 2025-12-04T08:59:06.2210432Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T08:59:06.2525730Z Entering 'android/libs/fbjni' 2025-12-04T08:59:06.2583437Z Entering 'third_party/FP16' 2025-12-04T08:59:06.2640263Z Entering 'third_party/FXdiv' 2025-12-04T08:59:06.2697451Z Entering 'third_party/NNPACK' 2025-12-04T08:59:06.2756281Z Entering 'third_party/NVTX' 2025-12-04T08:59:06.2818655Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:06.2874972Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:06.2949341Z Entering 'third_party/aiter' 2025-12-04T08:59:06.3007347Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:06.3081324Z Entering 'third_party/benchmark' 2025-12-04T08:59:06.3140087Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:06.3206593Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:06.3264246Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:06.3323908Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:06.3382083Z Entering 'third_party/cutlass' 2025-12-04T08:59:06.3446918Z Entering 'third_party/fbgemm' 2025-12-04T08:59:06.3508778Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:06.3569530Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:06.3637498Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:06.3695238Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:06.3762149Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:06.3821037Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:06.3875837Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:06.3937730Z Entering 'third_party/flash-attention' 2025-12-04T08:59:06.4003538Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:06.4074325Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:06.4149694Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:06.4212429Z Entering 'third_party/fmt' 2025-12-04T08:59:06.4270588Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:06.4330114Z Entering 'third_party/gloo' 2025-12-04T08:59:06.4391928Z Entering 'third_party/googletest' 2025-12-04T08:59:06.4452216Z Entering 'third_party/ideep' 2025-12-04T08:59:06.4509272Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:06.4574814Z Entering 'third_party/ittapi' 2025-12-04T08:59:06.4631838Z Entering 'third_party/kineto' 2025-12-04T08:59:06.4689489Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:06.4753480Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:06.4815309Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:06.4872040Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:06.4930033Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:06.4986109Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:06.5048740Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:06.5111675Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:06.5174338Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:06.5233435Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:06.5290262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:06.5351606Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.5414424Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.5476868Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:06.5534475Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:06.5593327Z Entering 'third_party/kleidiai' 2025-12-04T08:59:06.5649998Z Entering 'third_party/mimalloc' 2025-12-04T08:59:06.5710509Z Entering 'third_party/nlohmann' 2025-12-04T08:59:06.5772665Z Entering 'third_party/onnx' 2025-12-04T08:59:06.5852893Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:06.5916284Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:06.5975644Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:06.6031631Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:06.6088487Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:06.6153654Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:06.6220698Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:06.6275895Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:06.6333376Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:06.6394948Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.6453802Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.6514141Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:06.6590301Z Entering 'third_party/pocketfft' 2025-12-04T08:59:06.6648814Z Entering 'third_party/protobuf' 2025-12-04T08:59:06.6712428Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:06.6768071Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:06.6834525Z Entering 'third_party/psimd' 2025-12-04T08:59:06.6894083Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:06.6949665Z Entering 'third_party/pybind11' 2025-12-04T08:59:06.7006551Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:06.7064860Z Entering 'third_party/sleef' 2025-12-04T08:59:06.7123239Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:06.7180387Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:06.7235883Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:06.7291307Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:06.7351676Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:06.7407253Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:06.7492336Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T08:59:06.7808016Z Entering 'android/libs/fbjni' 2025-12-04T08:59:06.7867744Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T08:59:06.7884144Z Entering 'third_party/FP16' 2025-12-04T08:59:06.7938648Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T08:59:06.7955995Z Entering 'third_party/FXdiv' 2025-12-04T08:59:06.8009004Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T08:59:06.8027660Z Entering 'third_party/NNPACK' 2025-12-04T08:59:06.8078483Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T08:59:06.8095668Z Entering 'third_party/NVTX' 2025-12-04T08:59:06.8148836Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T08:59:06.8166623Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:06.8219309Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T08:59:06.8237002Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:06.8290316Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T08:59:06.8325910Z Entering 'third_party/aiter' 2025-12-04T08:59:06.8382371Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T08:59:06.8400432Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:06.8451855Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T08:59:06.8478195Z Entering 'third_party/benchmark' 2025-12-04T08:59:06.8530870Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:59:06.8549081Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:06.8601480Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T08:59:06.8629196Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:06.8681604Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T08:59:06.8697959Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:06.8752158Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T08:59:06.8770129Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:06.8822897Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T08:59:06.8840649Z Entering 'third_party/cutlass' 2025-12-04T08:59:06.8895138Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T08:59:06.8921118Z Entering 'third_party/fbgemm' 2025-12-04T08:59:06.8974209Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T08:59:06.8994650Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:06.9045478Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T08:59:06.9061336Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:06.9116043Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T08:59:06.9141087Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:06.9194550Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T08:59:06.9211796Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:06.9262530Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T08:59:06.9288811Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:06.9340275Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T08:59:06.9358269Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:06.9413856Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T08:59:06.9430647Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:06.9481950Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T08:59:06.9500766Z Entering 'third_party/flash-attention' 2025-12-04T08:59:06.9553765Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T08:59:06.9571284Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:06.9623453Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T08:59:06.9646926Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:06.9699950Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T08:59:06.9727007Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:06.9779222Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T08:59:06.9800062Z Entering 'third_party/fmt' 2025-12-04T08:59:06.9852300Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:59:06.9870262Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:06.9924978Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T08:59:06.9940742Z Entering 'third_party/gloo' 2025-12-04T08:59:06.9994397Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T08:59:07.0011500Z Entering 'third_party/googletest' 2025-12-04T08:59:07.0062489Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:07.0080305Z Entering 'third_party/ideep' 2025-12-04T08:59:07.0133179Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T08:59:07.0149619Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:07.0201492Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T08:59:07.0227864Z Entering 'third_party/ittapi' 2025-12-04T08:59:07.0279348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T08:59:07.0295117Z Entering 'third_party/kineto' 2025-12-04T08:59:07.0347957Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T08:59:07.0365194Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:07.0418417Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T08:59:07.0435704Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:07.0488670Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T08:59:07.0508378Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:07.0561894Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T08:59:07.0577881Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:07.0632277Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:59:07.0649584Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:07.0702377Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T08:59:07.0719338Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:07.0773824Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T08:59:07.0793960Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:07.0848012Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T08:59:07.0863993Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:07.0918813Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:07.0935640Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:07.0988881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T08:59:07.1008443Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:07.1060751Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T08:59:07.1078430Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:07.1132044Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:59:07.1149422Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.1201607Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:59:07.1219513Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.1272347Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:59:07.1294124Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:07.1346085Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T08:59:07.1363520Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:07.1416136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T08:59:07.1434583Z Entering 'third_party/kleidiai' 2025-12-04T08:59:07.1487264Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T08:59:07.1504314Z Entering 'third_party/mimalloc' 2025-12-04T08:59:07.1557681Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T08:59:07.1574736Z Entering 'third_party/nlohmann' 2025-12-04T08:59:07.1627116Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T08:59:07.1646326Z Entering 'third_party/onnx' 2025-12-04T08:59:07.1699987Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T08:59:07.1735216Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:07.1787220Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:59:07.1807164Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:07.1859465Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T08:59:07.1878497Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:07.1934516Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:59:07.1952818Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:07.2005001Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:07.2020483Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:07.2074244Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T08:59:07.2089989Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:07.2143039Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T08:59:07.2162548Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:07.2215741Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T08:59:07.2232626Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:07.2284036Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T08:59:07.2299825Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:07.2353753Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:59:07.2369519Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.2424881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:59:07.2444322Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.2497385Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:59:07.2519052Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:07.2571150Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T08:59:07.2611147Z Entering 'third_party/pocketfft' 2025-12-04T08:59:07.2663855Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T08:59:07.2680313Z Entering 'third_party/protobuf' 2025-12-04T08:59:07.2735427Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T08:59:07.2755359Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:07.2808249Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:59:07.2824003Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:07.2876945Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:07.2896085Z Entering 'third_party/psimd' 2025-12-04T08:59:07.2949221Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T08:59:07.2966586Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:07.3019631Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T08:59:07.3037213Z Entering 'third_party/pybind11' 2025-12-04T08:59:07.3090263Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:59:07.3108556Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:07.3160373Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T08:59:07.3176356Z Entering 'third_party/sleef' 2025-12-04T08:59:07.3229217Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T08:59:07.3246650Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:07.3300031Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T08:59:07.3317396Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:07.3368110Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:07.3383980Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:07.3440570Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T08:59:07.3456283Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:07.3510891Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T08:59:07.3527983Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:07.3578889Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:59:07.3595703Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:07.3647679Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T08:59:07.4258641Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T08:59:07.4583557Z Entering 'android/libs/fbjni' 2025-12-04T08:59:07.4628743Z Entering 'third_party/FP16' 2025-12-04T08:59:07.4671427Z Entering 'third_party/FXdiv' 2025-12-04T08:59:07.4716023Z Entering 'third_party/NNPACK' 2025-12-04T08:59:07.4759416Z Entering 'third_party/NVTX' 2025-12-04T08:59:07.4804311Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:07.4849115Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:07.4911011Z Entering 'third_party/aiter' 2025-12-04T08:59:07.4955682Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:07.5008468Z Entering 'third_party/benchmark' 2025-12-04T08:59:07.5053191Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:07.5106602Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:07.5150744Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:07.5195225Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:07.5239274Z Entering 'third_party/cutlass' 2025-12-04T08:59:07.5296381Z Entering 'third_party/fbgemm' 2025-12-04T08:59:07.5342493Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:07.5384736Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:07.5437694Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:07.5480557Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:07.5534465Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:07.5579262Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:07.5621223Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:07.5667014Z Entering 'third_party/flash-attention' 2025-12-04T08:59:07.5713720Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:07.5763007Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:07.5816185Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:07.5863659Z Entering 'third_party/fmt' 2025-12-04T08:59:07.5909940Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:07.5954267Z Entering 'third_party/gloo' 2025-12-04T08:59:07.5998278Z Entering 'third_party/googletest' 2025-12-04T08:59:07.6042766Z Entering 'third_party/ideep' 2025-12-04T08:59:07.6085790Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:07.6136425Z Entering 'third_party/ittapi' 2025-12-04T08:59:07.6179385Z Entering 'third_party/kineto' 2025-12-04T08:59:07.6222090Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:07.6264753Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:07.6313011Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:07.6356473Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:07.6399441Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:07.6440991Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:07.6486112Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:07.6530542Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:07.6577154Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:07.6620793Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:07.6662853Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:07.6709888Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.6758444Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.6806135Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:07.6848513Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:07.6900515Z Entering 'third_party/kleidiai' 2025-12-04T08:59:07.6944740Z Entering 'third_party/mimalloc' 2025-12-04T08:59:07.6989883Z Entering 'third_party/nlohmann' 2025-12-04T08:59:07.7033161Z Entering 'third_party/onnx' 2025-12-04T08:59:07.7095185Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:07.7143065Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:07.7189280Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:07.7231999Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:07.7274261Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:07.7316966Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:07.7360792Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:07.7402688Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:07.7449634Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:07.7500125Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.7543717Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.7588134Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:07.7653077Z Entering 'third_party/pocketfft' 2025-12-04T08:59:07.7695829Z Entering 'third_party/protobuf' 2025-12-04T08:59:07.7742715Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:07.7784262Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:07.7829806Z Entering 'third_party/psimd' 2025-12-04T08:59:07.7872991Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:07.7917387Z Entering 'third_party/pybind11' 2025-12-04T08:59:07.7960654Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:07.8004983Z Entering 'third_party/sleef' 2025-12-04T08:59:07.8049833Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:07.8095019Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:07.8136108Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:07.8177468Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:07.8219281Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:07.8259976Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:07.8323731Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T08:59:07.8645452Z Entering 'android/libs/fbjni' 2025-12-04T08:59:07.8688280Z Entering 'third_party/FP16' 2025-12-04T08:59:07.8732307Z Entering 'third_party/FXdiv' 2025-12-04T08:59:07.8775418Z Entering 'third_party/NNPACK' 2025-12-04T08:59:07.8817919Z Entering 'third_party/NVTX' 2025-12-04T08:59:07.8860515Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:07.8906808Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:07.8965894Z Entering 'third_party/aiter' 2025-12-04T08:59:07.9011801Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:07.9062853Z Entering 'third_party/benchmark' 2025-12-04T08:59:07.9111295Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:07.9163296Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:07.9207197Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:07.9251388Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:07.9295093Z Entering 'third_party/cutlass' 2025-12-04T08:59:07.9349186Z Entering 'third_party/fbgemm' 2025-12-04T08:59:07.9396701Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:07.9439966Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:07.9492284Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:07.9534967Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:07.9586234Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:07.9629743Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:07.9671393Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:07.9718886Z Entering 'third_party/flash-attention' 2025-12-04T08:59:07.9764098Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:07.9818264Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:07.9873604Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:07.9921173Z Entering 'third_party/fmt' 2025-12-04T08:59:07.9966998Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:08.0011706Z Entering 'third_party/gloo' 2025-12-04T08:59:08.0054951Z Entering 'third_party/googletest' 2025-12-04T08:59:08.0097359Z Entering 'third_party/ideep' 2025-12-04T08:59:08.0139390Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:08.0189182Z Entering 'third_party/ittapi' 2025-12-04T08:59:08.0232144Z Entering 'third_party/kineto' 2025-12-04T08:59:08.0274329Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:08.0317155Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:08.0362451Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:08.0406856Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:08.0451129Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:08.0494891Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:08.0537580Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:08.0581526Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:08.0624359Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:08.0673152Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:08.0718264Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:08.0760206Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:08.0807374Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:08.0857554Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:08.0901562Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:08.0944884Z Entering 'third_party/kleidiai' 2025-12-04T08:59:08.0990157Z Entering 'third_party/mimalloc' 2025-12-04T08:59:08.1034515Z Entering 'third_party/nlohmann' 2025-12-04T08:59:08.1079193Z Entering 'third_party/onnx' 2025-12-04T08:59:08.1143368Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:08.1187484Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:08.1232910Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:08.1275061Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:08.1318197Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:08.1360385Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:08.1405561Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:08.1448562Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:08.1492892Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:08.1535083Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:08.1577984Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:08.1621722Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:08.1686683Z Entering 'third_party/pocketfft' 2025-12-04T08:59:08.1730913Z Entering 'third_party/protobuf' 2025-12-04T08:59:08.1776935Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:08.1820714Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:08.1871634Z Entering 'third_party/psimd' 2025-12-04T08:59:08.1916130Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:08.1958849Z Entering 'third_party/pybind11' 2025-12-04T08:59:08.2002786Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:08.2046410Z Entering 'third_party/sleef' 2025-12-04T08:59:08.2093502Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:08.2135115Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:08.2177223Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:08.2219263Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:08.2261195Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:08.2302891Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:08.2362366Z ##[endgroup] 2025-12-04T08:59:08.2401025Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T08:59:08.2427157Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:59:08.2536657Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-12-04T08:59:08.2537068Z cd "${GITHUB_WORKSPACE}" 2025-12-04T08:59:08.2537419Z # Clean stale submodule dirs 2025-12-04T08:59:08.2537789Z if [ -z "${NO_SUDO}" ]; then 2025-12-04T08:59:08.2538246Z  sudo git submodule foreach --recursive git clean -ffdx 2025-12-04T08:59:08.2538681Z else 2025-12-04T08:59:08.2539031Z  git submodule foreach --recursive git clean -ffdx 2025-12-04T08:59:08.2539452Z fi 2025-12-04T08:59:08.2548312Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:08.2548705Z env: 2025-12-04T08:59:08.2549086Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:08.2549348Z NO_SUDO: true 2025-12-04T08:59:08.2549678Z ##[endgroup] 2025-12-04T08:59:08.2879614Z Entering 'android/libs/fbjni' 2025-12-04T08:59:08.2913079Z Entering 'third_party/FP16' 2025-12-04T08:59:08.2944419Z Entering 'third_party/FXdiv' 2025-12-04T08:59:08.2977084Z Entering 'third_party/NNPACK' 2025-12-04T08:59:08.3014208Z Entering 'third_party/NVTX' 2025-12-04T08:59:08.3053660Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:08.3087318Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:08.3210350Z Entering 'third_party/aiter' 2025-12-04T08:59:08.3253684Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:08.3359572Z Entering 'third_party/benchmark' 2025-12-04T08:59:08.3394567Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:08.3510982Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:08.3544814Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:08.3583745Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:08.3622628Z Entering 'third_party/cutlass' 2025-12-04T08:59:08.3725501Z Entering 'third_party/fbgemm' 2025-12-04T08:59:08.3785342Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:08.3822109Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:08.3934984Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:08.3974622Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:08.4075335Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:08.4111533Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:08.4140836Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:08.4186001Z Entering 'third_party/flash-attention' 2025-12-04T08:59:08.4227845Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:08.4322745Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:08.4413681Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:08.4480293Z Entering 'third_party/fmt' 2025-12-04T08:59:08.4515446Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:08.4550884Z Entering 'third_party/gloo' 2025-12-04T08:59:08.4587044Z Entering 'third_party/googletest' 2025-12-04T08:59:08.4622854Z Entering 'third_party/ideep' 2025-12-04T08:59:08.4654510Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:08.4742926Z Entering 'third_party/ittapi' 2025-12-04T08:59:08.4777589Z Entering 'third_party/kineto' 2025-12-04T08:59:08.4814595Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:08.4855281Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:08.4900198Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:08.4933396Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:08.4967526Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:08.4999393Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:08.5032485Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:08.5071262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:08.5105054Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:08.5147827Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:08.5179552Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:08.5213733Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:08.5260884Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:08.5300997Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:08.5340391Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:08.5376808Z Entering 'third_party/kleidiai' 2025-12-04T08:59:08.5425735Z Entering 'third_party/mimalloc' 2025-12-04T08:59:08.5460031Z Entering 'third_party/nlohmann' 2025-12-04T08:59:08.5506270Z Entering 'third_party/onnx' 2025-12-04T08:59:08.5806419Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:08.5848813Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:08.5902966Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:08.5934997Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:08.5969618Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:08.6001648Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:08.6046001Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:08.6077205Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:08.6111047Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:08.6142375Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:08.6195100Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:08.6234752Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:08.6474721Z Entering 'third_party/pocketfft' 2025-12-04T08:59:08.6508755Z Entering 'third_party/protobuf' 2025-12-04T08:59:08.6582895Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:08.6617855Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:08.6654491Z Entering 'third_party/psimd' 2025-12-04T08:59:08.6686003Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:08.6718744Z Entering 'third_party/pybind11' 2025-12-04T08:59:08.6754044Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:08.6787263Z Entering 'third_party/sleef' 2025-12-04T08:59:08.6822608Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:08.6857271Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:08.6893679Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:08.6931354Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:08.6968161Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:08.6999488Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:08.7178904Z Prepare all required actions 2025-12-04T08:59:08.7179534Z Getting action download info 2025-12-04T08:59:08.8811962Z ##[group]Run ./.github/actions/setup-linux 2025-12-04T08:59:08.8812280Z env: 2025-12-04T08:59:08.8812506Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:08.8812778Z ##[endgroup] 2025-12-04T08:59:08.8851936Z ##[group]Run set -euo pipefail 2025-12-04T08:59:08.8852286Z set -euo pipefail 2025-12-04T08:59:08.8852591Z function get_ec2_metadata() { 2025-12-04T08:59:08.8853102Z  # Pulled from instance metadata endpoint for EC2 2025-12-04T08:59:08.8854004Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-12-04T08:59:08.8854678Z  category=$1 2025-12-04T08:59:08.8855094Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-12-04T08:59:08.8855591Z  runner_name_str=i-02e8ffc45eb447a37 2025-12-04T08:59:08.8856034Z  if [[ -f /.inarc ]]; then 2025-12-04T08:59:08.8856423Z  echo "ARC Runner, no info on ec2 metadata" 2025-12-04T08:59:08.8856876Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-12-04T08:59:08.8857419Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-12-04T08:59:08.8857921Z  else 2025-12-04T08:59:08.8858915Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-12-04T08:59:08.8859982Z  fi 2025-12-04T08:59:08.8860229Z } 2025-12-04T08:59:08.8860690Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-12-04T08:59:08.8861172Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-12-04T08:59:08.8861733Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-12-04T08:59:08.8862213Z echo "system info $(uname -a)" 2025-12-04T08:59:08.8868639Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:08.8869031Z env: 2025-12-04T08:59:08.8869253Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:08.8869507Z ##[endgroup] 2025-12-04T08:59:08.9021204Z ami-id: ami-08982f1c5bf93d976 2025-12-04T08:59:08.9133161Z instance-id: i-02e8ffc45eb447a37 2025-12-04T08:59:08.9241330Z instance-type: g4dn.12xlarge 2025-12-04T08:59:08.9252533Z system info Linux ip-10-1-32-85.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-12-04T08:59:08.9293504Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T08:59:08.9294061Z if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T08:59:08.9300348Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:08.9300783Z env: 2025-12-04T08:59:08.9301017Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:08.9301317Z ##[endgroup] 2025-12-04T08:59:11.0018699Z Thu Dec 4 08:59:10 2025 2025-12-04T08:59:11.0019271Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:59:11.0019900Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T08:59:11.0020524Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:11.0021155Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T08:59:11.0021827Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T08:59:11.0022357Z | | | MIG M. | 2025-12-04T08:59:11.0022788Z |=========================================+========================+======================| 2025-12-04T08:59:11.0410116Z | 0 Tesla T4 Off | 00000000:00:1B.0 Off | 0 | 2025-12-04T08:59:11.0411347Z | N/A 32C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T08:59:11.0411903Z | | | N/A | 2025-12-04T08:59:11.0412383Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:11.0413013Z | 1 Tesla T4 Off | 00000000:00:1C.0 Off | 0 | 2025-12-04T08:59:11.0413724Z | N/A 32C P0 25W / 70W | 0MiB / 15360MiB | 5% Default | 2025-12-04T08:59:11.0414195Z | | | N/A | 2025-12-04T08:59:11.0414682Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:11.0415231Z | 2 Tesla T4 Off | 00000000:00:1D.0 Off | 0 | 2025-12-04T08:59:11.0415756Z | N/A 33C P0 24W / 70W | 0MiB / 15360MiB | 4% Default | 2025-12-04T08:59:11.0416221Z | | | N/A | 2025-12-04T08:59:11.0416695Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:11.0417236Z | 3 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T08:59:11.0417753Z | N/A 32C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T08:59:11.0418211Z | | | N/A | 2025-12-04T08:59:11.0418705Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:11.0419214Z 2025-12-04T08:59:11.0419427Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:59:11.0419975Z | Processes: | 2025-12-04T08:59:11.0420530Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T08:59:11.0421029Z | ID ID Usage | 2025-12-04T08:59:11.0421450Z |=========================================================================================| 2025-12-04T08:59:11.0433000Z | No running processes found | 2025-12-04T08:59:11.0433589Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:59:12.6702684Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:59:12.6704727Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:59:12.6715819Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:12.6716569Z env: 2025-12-04T08:59:12.6716986Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:12.6717463Z ##[endgroup] 2025-12-04T08:59:12.6888520Z ##[group]Run if systemctl is-active --quiet docker; then 2025-12-04T08:59:12.6889024Z if systemctl is-active --quiet docker; then 2025-12-04T08:59:12.6889475Z  echo "Docker daemon is running..."; 2025-12-04T08:59:12.6889853Z else 2025-12-04T08:59:12.6890248Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-12-04T08:59:12.6890745Z fi 2025-12-04T08:59:12.6897261Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:12.6897716Z env: 2025-12-04T08:59:12.6897951Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:12.6898250Z ##[endgroup] 2025-12-04T08:59:12.6978265Z Docker daemon is running... 2025-12-04T08:59:12.7044165Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T08:59:12.7044698Z with: 2025-12-04T08:59:12.7045066Z shell: bash 2025-12-04T08:59:12.7045694Z timeout_minutes: 5 2025-12-04T08:59:12.7046161Z max_attempts: 3 2025-12-04T08:59:12.7046574Z retry_wait_seconds: 30 2025-12-04T08:59:12.7051103Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-12-04T08:59:12.7056370Z polling_interval_seconds: 1 2025-12-04T08:59:12.7056979Z warning_on_retry: true 2025-12-04T08:59:12.7057529Z continue_on_error: false 2025-12-04T08:59:12.7058049Z env: 2025-12-04T08:59:12.7058495Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:12.7059057Z AWS_RETRY_MODE: standard 2025-12-04T08:59:12.7059594Z AWS_MAX_ATTEMPTS: 5 2025-12-04T08:59:12.7060124Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:59:12.7060695Z ##[endgroup] 2025-12-04T08:59:13.8689752Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:13.8690477Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:13.8691125Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:13.8691583Z 2025-12-04T08:59:13.8692134Z Login Succeeded 2025-12-04T08:59:14.4050904Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:14.4052049Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:14.4052710Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:14.4053288Z 2025-12-04T08:59:14.4053584Z Login Succeeded 2025-12-04T08:59:14.7974028Z Command completed after 1 attempt(s). 2025-12-04T08:59:14.8031665Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:59:14.8032208Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:59:14.8032698Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:59:14.8039422Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:14.8039806Z env: 2025-12-04T08:59:14.8040031Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:14.8040305Z ##[endgroup] 2025-12-04T08:59:14.8263846Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:59:14.8264499Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:59:14.8265004Z # shellcheck disable=SC2046 2025-12-04T08:59:14.8265393Z docker stop $(docker ps -q) || true 2025-12-04T08:59:14.8265900Z # Prune all of the docker images 2025-12-04T08:59:14.8266256Z docker system prune -af 2025-12-04T08:59:14.8272231Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:14.8272648Z env: 2025-12-04T08:59:14.8272873Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:14.8273158Z ##[endgroup] 2025-12-04T08:59:14.8524455Z "docker stop" requires at least 1 argument. 2025-12-04T08:59:14.8524894Z See 'docker stop --help'. 2025-12-04T08:59:14.8525094Z 2025-12-04T08:59:14.8525295Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T08:59:14.8525644Z 2025-12-04T08:59:14.8525766Z Stop one or more running containers 2025-12-04T08:59:14.8697714Z Total reclaimed space: 0B 2025-12-04T08:59:14.8894166Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T08:59:14.8894743Z with: 2025-12-04T08:59:14.8895681Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.8896734Z use-custom-docker-registry: true 2025-12-04T08:59:14.8897100Z docker-build-dir: .ci/docker 2025-12-04T08:59:14.8897441Z docker-build-script: ./build.sh 2025-12-04T08:59:14.8897771Z working-directory: . 2025-12-04T08:59:14.8898176Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:14.8898644Z force-push: false 2025-12-04T08:59:14.8898888Z env: 2025-12-04T08:59:14.8899130Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:14.8899428Z ##[endgroup] 2025-12-04T08:59:14.8918665Z ##[group]Run set -ex 2025-12-04T08:59:14.8918962Z set -ex 2025-12-04T08:59:14.8919199Z  2025-12-04T08:59:14.8919650Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T08:59:14.8920368Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T08:59:14.8920964Z # job could then download the pre-built image as usual 2025-12-04T08:59:14.8921688Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T08:59:14.8922351Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8922691Z else 2025-12-04T08:59:14.8922960Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8923422Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8923832Z  2025-12-04T08:59:14.8924408Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T08:59:14.8925078Z  exit 0 2025-12-04T08:59:14.8925438Z fi 2025-12-04T08:59:14.8925658Z  2025-12-04T08:59:14.8926017Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T08:59:14.8926648Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T08:59:14.8927198Z  # use it as it is, but first let's extract the tag 2025-12-04T08:59:14.8927703Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T08:59:14.8928237Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8928738Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8929164Z else 2025-12-04T08:59:14.8929435Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T08:59:14.8929831Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T08:59:14.8930233Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T08:59:14.8930583Z  fi 2025-12-04T08:59:14.8931059Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T08:59:14.8931701Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8932374Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8933221Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.8933864Z fi 2025-12-04T08:59:14.8940342Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:14.8940783Z env: 2025-12-04T08:59:14.8941033Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:14.8941335Z REPO_NAME: pytorch 2025-12-04T08:59:14.8942425Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.8943484Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:59:14.8943826Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T08:59:14.8944262Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:14.8944739Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T08:59:14.8945079Z CUSTOM_TAG_PREFIX: 2025-12-04T08:59:14.8945346Z ##[endgroup] 2025-12-04T08:59:14.8969165Z + [[ -d .ci/docker ]] 2025-12-04T08:59:14.8969513Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T08:59:14.8969840Z + [[ true == \t\r\u\e ]] 2025-12-04T08:59:14.8970141Z + echo skip=false 2025-12-04T08:59:14.8971361Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T08:59:14.8977482Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.8978456Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T08:59:14.9002114Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.9003204Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.9004581Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.9029139Z ##[group]Run set +e 2025-12-04T08:59:14.9029451Z set +e 2025-12-04T08:59:14.9029687Z set -x 2025-12-04T08:59:14.9029916Z  2025-12-04T08:59:14.9030122Z login() { 2025-12-04T08:59:14.9030613Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:59:14.9031154Z } 2025-12-04T08:59:14.9031366Z  2025-12-04T08:59:14.9031719Z retry () { 2025-12-04T08:59:14.9032006Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:59:14.9032322Z } 2025-12-04T08:59:14.9032543Z  2025-12-04T08:59:14.9032792Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:59:14.9033097Z  2025-12-04T08:59:14.9033325Z START_TIME=$(date +%s) 2025-12-04T08:59:14.9033633Z # Wait up to 120 minutes 2025-12-04T08:59:14.9034008Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T08:59:14.9034510Z  # Check if image already exists, if it does then skip building it 2025-12-04T08:59:14.9035025Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T08:59:14.9035405Z  exit 0 2025-12-04T08:59:14.9035633Z  fi 2025-12-04T08:59:14.9035854Z  2025-12-04T08:59:14.9036252Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T08:59:14.9036946Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T08:59:14.9037623Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T08:59:14.9060038Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T08:59:14.9060538Z  # It's a Docker build job, let's build the image 2025-12-04T08:59:14.9060943Z  break 2025-12-04T08:59:14.9061218Z  else 2025-12-04T08:59:14.9061618Z  # It's a regular build job, wait for the image to become available 2025-12-04T08:59:14.9062090Z  sleep 300 2025-12-04T08:59:14.9062377Z  fi 2025-12-04T08:59:14.9062624Z done 2025-12-04T08:59:14.9062866Z  2025-12-04T08:59:14.9063272Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T08:59:14.9064201Z # be empty. The default action would be to continue rebuild the image 2025-12-04T08:59:14.9064816Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T08:59:14.9065333Z  # if we're on the base branch then use the parent commit 2025-12-04T08:59:14.9065887Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T08:59:14.9066217Z else 2025-12-04T08:59:14.9066542Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T08:59:14.9067032Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T08:59:14.9067408Z fi 2025-12-04T08:59:14.9067629Z  2025-12-04T08:59:14.9067856Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T08:59:14.9068225Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.9068561Z  2025-12-04T08:59:14.9069026Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T08:59:14.9069592Z  exit 0 2025-12-04T08:59:14.9069815Z fi 2025-12-04T08:59:14.9070015Z  2025-12-04T08:59:14.9070328Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T08:59:14.9071036Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T08:59:14.9071648Z  exit 1 2025-12-04T08:59:14.9071864Z fi 2025-12-04T08:59:14.9072078Z  2025-12-04T08:59:14.9072453Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T08:59:14.9073136Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T08:59:14.9073736Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T08:59:14.9074444Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T08:59:14.9075239Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T08:59:14.9075784Z fi 2025-12-04T08:59:14.9075982Z  2025-12-04T08:59:14.9076244Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:14.9083161Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:14.9083592Z env: 2025-12-04T08:59:14.9083836Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:14.9084135Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:59:14.9084527Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:59:14.9085610Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.9086946Z DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.9087738Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:14.9088207Z DOCKER_PUSH: 2025-12-04T08:59:14.9088476Z ##[endgroup] 2025-12-04T08:59:14.9113065Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:14.9113606Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:14.9116464Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:59:14.9117576Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:15.4419917Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:15.4420637Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:15.4421297Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:15.4421742Z 2025-12-04T08:59:15.4421872Z Login Succeeded 2025-12-04T08:59:15.4439441Z ++ date +%s 2025-12-04T08:59:15.4447275Z + START_TIME=1764838755 2025-12-04T08:59:15.4451085Z ++ date +%s 2025-12-04T08:59:15.4462798Z + [[ 1764831555 -lt 1764838755 ]] 2025-12-04T08:59:15.4463900Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.6342821Z { 2025-12-04T08:59:15.6343100Z "schemaVersion": 2, 2025-12-04T08:59:15.6343590Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T08:59:15.6344299Z "config": { 2025-12-04T08:59:15.6344684Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T08:59:15.6345147Z "size": 34864, 2025-12-04T08:59:15.6345601Z "digest": "sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301" 2025-12-04T08:59:15.6346130Z }, 2025-12-04T08:59:15.6346350Z "layers": [ 2025-12-04T08:59:15.6346571Z { 2025-12-04T08:59:15.6346938Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6347421Z "size": 30447951, 2025-12-04T08:59:15.6347899Z "digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63" 2025-12-04T08:59:15.6348460Z }, 2025-12-04T08:59:15.6348676Z { 2025-12-04T08:59:15.6349045Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6349501Z "size": 1554, 2025-12-04T08:59:15.6349948Z "digest": "sha256:0678d56345c994444b77bb70b1177189d23e794748b1d75ffc45d227c7dea94a" 2025-12-04T08:59:15.6350467Z }, 2025-12-04T08:59:15.6350663Z { 2025-12-04T08:59:15.6351026Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6351492Z "size": 313275661, 2025-12-04T08:59:15.6351967Z "digest": "sha256:45f5c9ddfce78349dff3d5edfbaa0310ae17311f66abdcd7e00fa21b500e801c" 2025-12-04T08:59:15.6352542Z }, 2025-12-04T08:59:15.6352753Z { 2025-12-04T08:59:15.6353104Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6353569Z "size": 787, 2025-12-04T08:59:15.6354032Z "digest": "sha256:086b1df51ac1162d9c45698e9dfaf91c6c222c8bd9ab01797ac8f9344bc8044f" 2025-12-04T08:59:15.6354678Z }, 2025-12-04T08:59:15.6355264Z { 2025-12-04T08:59:15.6355630Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6356097Z "size": 106, 2025-12-04T08:59:15.6356554Z "digest": "sha256:fe8a7b64bf98352f89057bcba66beef2fb44cc05fbd3606abccd8e86cf476234" 2025-12-04T08:59:15.6357110Z }, 2025-12-04T08:59:15.6357308Z { 2025-12-04T08:59:15.6357884Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6358535Z "size": 703, 2025-12-04T08:59:15.6358988Z "digest": "sha256:7680723e9a578033dd106b45784c639f06cc8adb1f5239ec513d9de01087c1af" 2025-12-04T08:59:15.6359508Z }, 2025-12-04T08:59:15.6359727Z { 2025-12-04T08:59:15.6360096Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6360553Z "size": 1216, 2025-12-04T08:59:15.6361014Z "digest": "sha256:9c5027aeeb4e3101f48c1d2e400c387110e1009e42497ee801f1b4b7f7edb5c0" 2025-12-04T08:59:15.6361543Z }, 2025-12-04T08:59:15.6361764Z { 2025-12-04T08:59:15.6362123Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6362600Z "size": 483, 2025-12-04T08:59:15.6363042Z "digest": "sha256:9a56521103600bd37a1e7c1191b5136c2d738c092f8a6701499f7068a32c2628" 2025-12-04T08:59:15.6363546Z }, 2025-12-04T08:59:15.6363761Z { 2025-12-04T08:59:15.6364129Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6364584Z "size": 110361875, 2025-12-04T08:59:15.6365046Z "digest": "sha256:375c4427e9141269458333b1463fdb219e736fd6231ec1c56c625c48437ace77" 2025-12-04T08:59:15.6365667Z }, 2025-12-04T08:59:15.6365859Z { 2025-12-04T08:59:15.6366211Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6366662Z "size": 4961, 2025-12-04T08:59:15.6367095Z "digest": "sha256:a86faaa7dbdd70e678e5ea20072637ee42618921ca8f80ca089f789325d4b0c2" 2025-12-04T08:59:15.6367613Z }, 2025-12-04T08:59:15.6367821Z { 2025-12-04T08:59:15.6368298Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6368746Z "size": 1755, 2025-12-04T08:59:15.6369189Z "digest": "sha256:fb7848686804957915d98f8655ef6da0fe4c521b50a82aefdebf475983505a15" 2025-12-04T08:59:15.6369698Z }, 2025-12-04T08:59:15.6369890Z { 2025-12-04T08:59:15.6370253Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6370709Z "size": 724, 2025-12-04T08:59:15.6371129Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:15.6371632Z }, 2025-12-04T08:59:15.6371836Z { 2025-12-04T08:59:15.6372176Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6372626Z "size": 543, 2025-12-04T08:59:15.6373179Z "digest": "sha256:79dc80f426b29d4ae9157b967050b03e66aa0c4b1295b944a1dd70106be87066" 2025-12-04T08:59:15.6373875Z }, 2025-12-04T08:59:15.6374076Z { 2025-12-04T08:59:15.6374460Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6374949Z "size": 3185190117, 2025-12-04T08:59:15.6375441Z "digest": "sha256:a13fcc1b90bb9c251ebe7ef2a03c4cb3afa1c8bdafe84f5f85136773059a3735" 2025-12-04T08:59:15.6375991Z }, 2025-12-04T08:59:15.6376206Z { 2025-12-04T08:59:15.6376568Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6377049Z "size": 32, 2025-12-04T08:59:15.6377518Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6378053Z }, 2025-12-04T08:59:15.6378269Z { 2025-12-04T08:59:15.6378864Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6379341Z "size": 396, 2025-12-04T08:59:15.6379807Z "digest": "sha256:549db4d6c618ecd9534658a233e3c90508f82d8735f965c2786b2eaa078869e5" 2025-12-04T08:59:15.6380345Z }, 2025-12-04T08:59:15.6380566Z { 2025-12-04T08:59:15.6380930Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6381425Z "size": 236860, 2025-12-04T08:59:15.6382020Z "digest": "sha256:5c63528cb580001e65104f4cb0809bf0673a00f989a7db42fd6d86aa1ec27cee" 2025-12-04T08:59:15.6382546Z }, 2025-12-04T08:59:15.6382766Z { 2025-12-04T08:59:15.6383141Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6383608Z "size": 231, 2025-12-04T08:59:15.6384073Z "digest": "sha256:75bd83b989a44e4d4119a3f972891025eb0e9ce95cfbe4a0ca5cdbe7130028d6" 2025-12-04T08:59:15.6384824Z }, 2025-12-04T08:59:15.6385006Z { 2025-12-04T08:59:15.6385340Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6385774Z "size": 3043497, 2025-12-04T08:59:15.6386185Z "digest": "sha256:de6e78970f517178cb91f36cd02bd9ca7b72a08fb82a0f9007516026f258c035" 2025-12-04T08:59:15.6386666Z }, 2025-12-04T08:59:15.6386861Z { 2025-12-04T08:59:15.6387193Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6387608Z "size": 1472, 2025-12-04T08:59:15.6388038Z "digest": "sha256:e13ed7c7e4736e81dc21af755b3363eb26e4d3b2f1ca988dfe65effa47d8fa42" 2025-12-04T08:59:15.6388536Z }, 2025-12-04T08:59:15.6388717Z { 2025-12-04T08:59:15.6389050Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6389476Z "size": 481, 2025-12-04T08:59:15.6389880Z "digest": "sha256:6e2949bcb74152577a0f20c38bcb6dd80f5e68427e3e531a80e08c9ecc73a979" 2025-12-04T08:59:15.6390366Z }, 2025-12-04T08:59:15.6390564Z { 2025-12-04T08:59:15.6390891Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6391321Z "size": 202, 2025-12-04T08:59:15.6391748Z "digest": "sha256:14d69d9aaec70287efd2fd35c4f93e43a29a4098458cc9fca1c93f02ad7356cb" 2025-12-04T08:59:15.6392241Z }, 2025-12-04T08:59:15.6392423Z { 2025-12-04T08:59:15.6392755Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6393181Z "size": 607, 2025-12-04T08:59:15.6393691Z "digest": "sha256:5c02769dd8e5bba2f7f5fd84bde9595fcb3bdbffcae497503fa846f9b5e78bf5" 2025-12-04T08:59:15.6394193Z }, 2025-12-04T08:59:15.6394388Z { 2025-12-04T08:59:15.6394709Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6395148Z "size": 7889619584, 2025-12-04T08:59:15.6395592Z "digest": "sha256:35041ce524ac4afec40ecd73b1393c830614f1f79d43a6439767a6c7d5b7027b" 2025-12-04T08:59:15.6396065Z }, 2025-12-04T08:59:15.6396264Z { 2025-12-04T08:59:15.6396603Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6397021Z "size": 830, 2025-12-04T08:59:15.6397441Z "digest": "sha256:2fa92dc5885e080e049ceb4139288b6c0e39fab34256945708b08ea55a1f7a0b" 2025-12-04T08:59:15.6397926Z }, 2025-12-04T08:59:15.6398124Z { 2025-12-04T08:59:15.6398449Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6398886Z "size": 33451739, 2025-12-04T08:59:15.6399327Z "digest": "sha256:2b85eafbd92a0e70a0a70154ad8bf4584095e576d95873368f30373f5966714a" 2025-12-04T08:59:15.6399803Z }, 2025-12-04T08:59:15.6400003Z { 2025-12-04T08:59:15.6400342Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6400759Z "size": 104, 2025-12-04T08:59:15.6401186Z "digest": "sha256:ff755a4ddad7880f23c6b767d432d6f1eafdb62b3ea18f8a98e22c441c099fcb" 2025-12-04T08:59:15.6401682Z }, 2025-12-04T08:59:15.6401867Z { 2025-12-04T08:59:15.6402207Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6402637Z "size": 1496, 2025-12-04T08:59:15.6403039Z "digest": "sha256:09eb41bdf42d8605b57b2363348154140904dec914b34a67298b82122bfce2b3" 2025-12-04T08:59:15.6403519Z }, 2025-12-04T08:59:15.6403713Z { 2025-12-04T08:59:15.6404048Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6404463Z "size": 458787828, 2025-12-04T08:59:15.6404891Z "digest": "sha256:11ede4d59e935e62f41b33220fe871794ab5e57ce724173b713368977683bcf6" 2025-12-04T08:59:15.6405371Z }, 2025-12-04T08:59:15.6405620Z { 2025-12-04T08:59:15.6405957Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6406384Z "size": 164, 2025-12-04T08:59:15.6406783Z "digest": "sha256:1283cd8f801a142172f3ab76fd472df8583223d9437de3e4d18d8cf98ea3fa98" 2025-12-04T08:59:15.6407265Z }, 2025-12-04T08:59:15.6407460Z { 2025-12-04T08:59:15.6407784Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6408214Z "size": 346, 2025-12-04T08:59:15.6408632Z "digest": "sha256:024fa855425fa524ad4500660cf61d53be62b99556d31b8b280d14caba434a35" 2025-12-04T08:59:15.6409115Z }, 2025-12-04T08:59:15.6409298Z { 2025-12-04T08:59:15.6409634Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6410061Z "size": 32, 2025-12-04T08:59:15.6410466Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6410954Z }, 2025-12-04T08:59:15.6411152Z { 2025-12-04T08:59:15.6411477Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6411903Z "size": 106, 2025-12-04T08:59:15.6412324Z "digest": "sha256:303e6747a62efecf5efa1f97d0e66b40a3b39da8d79a51f75b89f4c92ae7ec52" 2025-12-04T08:59:15.6412801Z }, 2025-12-04T08:59:15.6413092Z { 2025-12-04T08:59:15.6413612Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6414083Z "size": 424, 2025-12-04T08:59:15.6414591Z "digest": "sha256:3017cdf4838bcc9a33daebc07487f8ae1f6bd6e7ce8322c14f5480e8db9ef90e" 2025-12-04T08:59:15.6415149Z }, 2025-12-04T08:59:15.6415368Z { 2025-12-04T08:59:15.6415731Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6416208Z "size": 19309374, 2025-12-04T08:59:15.6416698Z "digest": "sha256:6b6cd1c358e886dc6ed7fd46ac4bcc1a0a73b7b1301739ea1953478ee5d83f50" 2025-12-04T08:59:15.6417235Z }, 2025-12-04T08:59:15.6417451Z { 2025-12-04T08:59:15.6417908Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6418383Z "size": 108, 2025-12-04T08:59:15.6418853Z "digest": "sha256:b2dd045011241d1cf8889e2a7369d9fe4844dfe15529b520ccd6a59bd3c1532e" 2025-12-04T08:59:15.6419394Z }, 2025-12-04T08:59:15.6419694Z { 2025-12-04T08:59:15.6420031Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6420464Z "size": 827, 2025-12-04T08:59:15.6420869Z "digest": "sha256:55adc51fe5897031d4cf2f2b8fd162213f6e46a52848630c616606271b97952e" 2025-12-04T08:59:15.6421351Z }, 2025-12-04T08:59:15.6421547Z { 2025-12-04T08:59:15.6421882Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6422296Z "size": 724, 2025-12-04T08:59:15.6422702Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:15.6423175Z }, 2025-12-04T08:59:15.6423357Z { 2025-12-04T08:59:15.6423694Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6424127Z "size": 149, 2025-12-04T08:59:15.6424524Z "digest": "sha256:a43ca0e4b837964b12b7469194cfe939c26de027298040028975324dce25938a" 2025-12-04T08:59:15.6425072Z + exit 0 2025-12-04T08:59:15.6425261Z }, 2025-12-04T08:59:15.6425458Z { 2025-12-04T08:59:15.6425793Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6426210Z "size": 138, 2025-12-04T08:59:15.6426625Z "digest": "sha256:b7212f17fd1404837fcfdd086dd0e2667931e4db377d45d8d89a44390c84e11d" 2025-12-04T08:59:15.6427117Z }, 2025-12-04T08:59:15.6427301Z { 2025-12-04T08:59:15.6427641Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6428071Z "size": 141, 2025-12-04T08:59:15.6428474Z "digest": "sha256:083e42cac090e6486c35f392b64ee54448f5e4aa947003aeb3e1f92c8ea5c099" 2025-12-04T08:59:15.6428958Z }, 2025-12-04T08:59:15.6429151Z { 2025-12-04T08:59:15.6429486Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6429996Z "size": 32, 2025-12-04T08:59:15.6430417Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6430908Z }, 2025-12-04T08:59:15.6431092Z { 2025-12-04T08:59:15.6431432Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6431865Z "size": 223, 2025-12-04T08:59:15.6432273Z "digest": "sha256:0a00b784a4aac341795729b254f7edd09e811b7f51d0c58e0e6bfeeee6940503" 2025-12-04T08:59:15.6432761Z }, 2025-12-04T08:59:15.6432959Z { 2025-12-04T08:59:15.6433283Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6433713Z "size": 255, 2025-12-04T08:59:15.6434129Z "digest": "sha256:c6173c779f7ba143a21214ea5f032b141863a37ceb4c0ac01d3248c216ce5241" 2025-12-04T08:59:15.6434613Z }, 2025-12-04T08:59:15.6434799Z { 2025-12-04T08:59:15.6435141Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6435583Z "size": 145520672, 2025-12-04T08:59:15.6436011Z "digest": "sha256:ed3d1e3387b924585c332bf1bc252fa159cd0d25256a874043ff0141b1ab5ff7" 2025-12-04T08:59:15.6436496Z }, 2025-12-04T08:59:15.6436700Z { 2025-12-04T08:59:15.6437025Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6437457Z "size": 106, 2025-12-04T08:59:15.6437868Z "digest": "sha256:b29343478586aeee19d2a622661716f6f1591280c890f49b727a8da13a610784" 2025-12-04T08:59:15.6438332Z }, 2025-12-04T08:59:15.6438529Z { 2025-12-04T08:59:15.6438862Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6439279Z "size": 312293530, 2025-12-04T08:59:15.6439709Z "digest": "sha256:c6f0520487fb506bc4601fd84d5f28d8a76b203e004731e4b2067c2ab1a14e0b" 2025-12-04T08:59:15.6440187Z }, 2025-12-04T08:59:15.6440381Z { 2025-12-04T08:59:15.6440703Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6441131Z "size": 3058011133, 2025-12-04T08:59:15.6441626Z "digest": "sha256:148171691cd4c4d20310d490d4b4dd903490d04ea07fb8f7e668a28768683e9a" 2025-12-04T08:59:15.6442099Z }, 2025-12-04T08:59:15.6442292Z { 2025-12-04T08:59:15.6442630Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6443043Z "size": 129, 2025-12-04T08:59:15.6443468Z "digest": "sha256:2c666d30ed77fff9ff1167d41cd645dad98280fcbe941f5bc3828c7ae66b1287" 2025-12-04T08:59:15.6443961Z }, 2025-12-04T08:59:15.6444143Z { 2025-12-04T08:59:15.6444476Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6444904Z "size": 880, 2025-12-04T08:59:15.6445316Z "digest": "sha256:5d8d3a0a98e012c5068e0f3bae5a03e3148ecf2d063634eee4c9241a1e3fdfb5" 2025-12-04T08:59:15.6445810Z }, 2025-12-04T08:59:15.6446002Z { 2025-12-04T08:59:15.6446336Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6446750Z "size": 724, 2025-12-04T08:59:15.6447162Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:15.6447640Z }, 2025-12-04T08:59:15.6447820Z { 2025-12-04T08:59:15.6448150Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6448577Z "size": 139, 2025-12-04T08:59:15.6448977Z "digest": "sha256:b06bafce9e817295d8127207747c80aa18e04392ff0875844fc30a1e794a8a0c" 2025-12-04T08:59:15.6449459Z }, 2025-12-04T08:59:15.6449655Z { 2025-12-04T08:59:15.6449977Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6450406Z "size": 32, 2025-12-04T08:59:15.6450826Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6451314Z }, 2025-12-04T08:59:15.6451494Z { 2025-12-04T08:59:15.6451828Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6452261Z "size": 159, 2025-12-04T08:59:15.6452672Z "digest": "sha256:15e0d7e4590d3d8f598d05aec3a92f891bf8b4605bcc38cc2de852b6014ef8f3" 2025-12-04T08:59:15.6453317Z }, 2025-12-04T08:59:15.6453700Z { 2025-12-04T08:59:15.6454065Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6454547Z "size": 1011, 2025-12-04T08:59:15.6455022Z "digest": "sha256:a514bd1add3164d8d7ca99aa19294c4ed8b97b074635d98714c4f598a959f4cd" 2025-12-04T08:59:15.6455553Z }, 2025-12-04T08:59:15.6455771Z { 2025-12-04T08:59:15.6456147Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6456612Z "size": 724, 2025-12-04T08:59:15.6457070Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:15.6457601Z }, 2025-12-04T08:59:15.6457817Z { 2025-12-04T08:59:15.6458181Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6458663Z "size": 134, 2025-12-04T08:59:15.6459129Z "digest": "sha256:57b84ee6000204f27a1d9bca199b19be4c86ecd324540dbdf239c56a6c3b34ea" 2025-12-04T08:59:15.6459662Z }, 2025-12-04T08:59:15.6459875Z { 2025-12-04T08:59:15.6460252Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6460718Z "size": 32, 2025-12-04T08:59:15.6461187Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6461734Z }, 2025-12-04T08:59:15.6461933Z { 2025-12-04T08:59:15.6462307Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6462790Z "size": 157, 2025-12-04T08:59:15.6463267Z "digest": "sha256:b8babeff6d817a5961dddc15c6bdfdbd05da187fae75d5804015f99fd7c066d8" 2025-12-04T08:59:15.6463826Z }, 2025-12-04T08:59:15.6464040Z { 2025-12-04T08:59:15.6464409Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6464878Z "size": 602, 2025-12-04T08:59:15.6465347Z "digest": "sha256:83779ddf6a85ab387f64a45f274cba245b69e4fd1931ff0b5d7d3efd4b7a43bc" 2025-12-04T08:59:15.6465976Z }, 2025-12-04T08:59:15.6466157Z { 2025-12-04T08:59:15.6468491Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6468977Z "size": 724, 2025-12-04T08:59:15.6469380Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:15.6469857Z }, 2025-12-04T08:59:15.6470053Z { 2025-12-04T08:59:15.6470381Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6470817Z "size": 155, 2025-12-04T08:59:15.6471238Z "digest": "sha256:8b7620c0d736cc79381207ce5afe2af90f0cd7f0cd394577d2c9520d7f74762f" 2025-12-04T08:59:15.6471730Z }, 2025-12-04T08:59:15.6471915Z { 2025-12-04T08:59:15.6472254Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6472686Z "size": 32, 2025-12-04T08:59:15.6473097Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6473591Z }, 2025-12-04T08:59:15.6473792Z { 2025-12-04T08:59:15.6474123Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6474564Z "size": 188, 2025-12-04T08:59:15.6474994Z "digest": "sha256:3bcfa090e4efd3677425f76baea9f1e0c50a75d8c6b5713ec05310f1dff24539" 2025-12-04T08:59:15.6475474Z }, 2025-12-04T08:59:15.6475674Z { 2025-12-04T08:59:15.6476013Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6476433Z "size": 1370, 2025-12-04T08:59:15.6476864Z "digest": "sha256:eb0504ec4d9218a79896b604f73dc0ea5a0f96266ad9c2cdbbbe5f0f18222694" 2025-12-04T08:59:15.6477360Z }, 2025-12-04T08:59:15.6477555Z { 2025-12-04T08:59:15.6477876Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6478301Z "size": 32, 2025-12-04T08:59:15.6478893Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6479592Z }, 2025-12-04T08:59:15.6479807Z { 2025-12-04T08:59:15.6480246Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6480716Z "size": 136, 2025-12-04T08:59:15.6481325Z "digest": "sha256:15d0fec09d7b196a1462d51516ee90fc3443ba178d3e56d59cacf32146b4321d" 2025-12-04T08:59:15.6481868Z }, 2025-12-04T08:59:15.6482070Z { 2025-12-04T08:59:15.6482446Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6482924Z "size": 528, 2025-12-04T08:59:15.6483385Z "digest": "sha256:cca81fcc62a949959ca4dd3c9056fb293d548ef8607127eeeef6cfd3a8897ca8" 2025-12-04T08:59:15.6483944Z }, 2025-12-04T08:59:15.6484161Z { 2025-12-04T08:59:15.6484538Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6485006Z "size": 32, 2025-12-04T08:59:15.6485472Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6486019Z }, 2025-12-04T08:59:15.6486220Z { 2025-12-04T08:59:15.6486589Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6487066Z "size": 104, 2025-12-04T08:59:15.6487536Z "digest": "sha256:b0b8f9b5c6ab98db9cd830dc584e1b6aec9add139e4cc48d8c243d36691e25b4" 2025-12-04T08:59:15.6488093Z }, 2025-12-04T08:59:15.6488307Z { 2025-12-04T08:59:15.6488667Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6489146Z "size": 435, 2025-12-04T08:59:15.6489610Z "digest": "sha256:0606ca4d47a8a70e91e92b03ca51a85e731641b09342136a54ef2f2a6d9dfb44" 2025-12-04T08:59:15.6490149Z }, 2025-12-04T08:59:15.6490351Z { 2025-12-04T08:59:15.6490724Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6491206Z "size": 32, 2025-12-04T08:59:15.6491734Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6492224Z }, 2025-12-04T08:59:15.6492417Z { 2025-12-04T08:59:15.6492735Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6493242Z "size": 109, 2025-12-04T08:59:15.6493982Z "digest": "sha256:2f80a4e1b3b95ed67bb781ea787e8a63e46de79117d9d8e65c257072b38afa2d" 2025-12-04T08:59:15.6494524Z }, 2025-12-04T08:59:15.6494745Z { 2025-12-04T08:59:15.6495120Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6495589Z "size": 1896, 2025-12-04T08:59:15.6496061Z "digest": "sha256:35c916fb1bd057e517dcab78c3a2a018e68096d8993892ad84f47562d37ae352" 2025-12-04T08:59:15.6496601Z }, 2025-12-04T08:59:15.6496817Z { 2025-12-04T08:59:15.6497178Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6497661Z "size": 197526165, 2025-12-04T08:59:15.6498139Z "digest": "sha256:195537b7dafc96192f768323b1a8cc2a914d41959849b73198579576b0872a44" 2025-12-04T08:59:15.6498659Z }, 2025-12-04T08:59:15.6498875Z { 2025-12-04T08:59:15.6499248Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6499711Z "size": 106, 2025-12-04T08:59:15.6500176Z "digest": "sha256:dc454fd3967e5735b2498b7f1d958a2c626987d5e4ce225ca98da3cd945b59f3" 2025-12-04T08:59:15.6500722Z }, 2025-12-04T08:59:15.6500924Z { 2025-12-04T08:59:15.6501297Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6501772Z "size": 165, 2025-12-04T08:59:15.6502225Z "digest": "sha256:701b34f115fa897181c046dc37288e87cbc3ad74c36a9e2224b5bfe7c5703afb" 2025-12-04T08:59:15.6502763Z }, 2025-12-04T08:59:15.6502980Z { 2025-12-04T08:59:15.6503353Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6503817Z "size": 7944, 2025-12-04T08:59:15.6504297Z "digest": "sha256:39cefc00ffedebc9098261c798408b87a20c95a88fccb110594077f48dadf760" 2025-12-04T08:59:15.6504844Z }, 2025-12-04T08:59:15.6505046Z { 2025-12-04T08:59:15.6505418Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6505981Z "size": 8071, 2025-12-04T08:59:15.6506388Z "digest": "sha256:6ae51eb61a325b2c2995a5088c81aa20821b75be65b5aa722c7c40556b5d03ea" 2025-12-04T08:59:15.6506872Z }, 2025-12-04T08:59:15.6507063Z { 2025-12-04T08:59:15.6507451Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6507879Z "size": 304, 2025-12-04T08:59:15.6508301Z "digest": "sha256:1fd5341e66dfc0c1ae23af014641a92a6fd02640c528fe6d4dc55921ed659a26" 2025-12-04T08:59:15.6508792Z }, 2025-12-04T08:59:15.6508975Z { 2025-12-04T08:59:15.6509316Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6509750Z "size": 13364291, 2025-12-04T08:59:15.6510175Z "digest": "sha256:72a7c87e35e40ab796f90aee1b51add7902f0cdc44406d2505b6c6a1f55a8da6" 2025-12-04T08:59:15.6510670Z }, 2025-12-04T08:59:15.6510868Z { 2025-12-04T08:59:15.6511192Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6511626Z "size": 108, 2025-12-04T08:59:15.6512057Z "digest": "sha256:ec36862ac98ebaac52ee1a8b1d162d45bd0e3bf59ae7e19c8f80ad3960b4c600" 2025-12-04T08:59:15.6512541Z }, 2025-12-04T08:59:15.6512749Z { 2025-12-04T08:59:15.6513090Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6513515Z "size": 54145699, 2025-12-04T08:59:15.6513956Z "digest": "sha256:05ddbf246e8add0e293474dbf88bb028d5a295a25ac59e8648a18db644377773" 2025-12-04T08:59:15.6514452Z }, 2025-12-04T08:59:15.6514653Z { 2025-12-04T08:59:15.6514976Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:15.6515415Z "size": 32, 2025-12-04T08:59:15.6515839Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:15.6516315Z } 2025-12-04T08:59:15.6516510Z ] 2025-12-04T08:59:15.6516706Z } 2025-12-04T08:59:15.6543827Z ##[group]Run set -eux 2025-12-04T08:59:15.6544142Z set -eux 2025-12-04T08:59:15.6544613Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T08:59:15.6546270Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T08:59:15.6553327Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:15.6553897Z env: 2025-12-04T08:59:15.6554120Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:15.6554414Z ##[endgroup] 2025-12-04T08:59:15.6582725Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T08:59:15.6583297Z + jq --raw-output .SecretString 2025-12-04T08:59:15.6584325Z + jq -r .docker_hub_readonly_token 2025-12-04T08:59:15.6585525Z + docker login --username pytorchbot --password-stdin 2025-12-04T08:59:16.2289587Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:16.2290300Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:16.2290962Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:16.2291407Z 2025-12-04T08:59:16.2291838Z Login Succeeded 2025-12-04T08:59:16.2383102Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T08:59:16.2383557Z tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T08:59:16.2384022Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-12-04T08:59:16.2391005Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:16.2391402Z env: 2025-12-04T08:59:16.2391626Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:16.2392497Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:16.2393383Z ##[endgroup] 2025-12-04T08:59:16.2420823Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:16.2467527Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T08:59:16.2467967Z with: 2025-12-04T08:59:16.2468784Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:16.2469914Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:16.2470313Z env: 2025-12-04T08:59:16.2470522Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:16.2470848Z ##[endgroup] 2025-12-04T08:59:16.2508387Z ##[group]Run set -x 2025-12-04T08:59:16.2508662Z set -x 2025-12-04T08:59:16.2508921Z set +e 2025-12-04T08:59:16.2509218Z  2025-12-04T08:59:16.2509435Z login() { 2025-12-04T08:59:16.2509932Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:59:16.2510462Z } 2025-12-04T08:59:16.2510677Z  2025-12-04T08:59:16.2510932Z retry () { 2025-12-04T08:59:16.2511213Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:59:16.2511521Z } 2025-12-04T08:59:16.2511735Z  2025-12-04T08:59:16.2511986Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:59:16.2512286Z  2025-12-04T08:59:16.2512796Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T08:59:16.2513491Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T08:59:16.2513877Z  2025-12-04T08:59:16.2514077Z set -e 2025-12-04T08:59:16.2514428Z # ignore output since only exit code is used for conditional 2025-12-04T08:59:16.2514938Z # only pull docker image if it's not available locally 2025-12-04T08:59:16.2515493Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T08:59:16.2516019Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T08:59:16.2516346Z fi 2025-12-04T08:59:16.2521595Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:16.2521984Z env: 2025-12-04T08:59:16.2522207Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:16.2523080Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:16.2524066Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:16.2524467Z ##[endgroup] 2025-12-04T08:59:16.2547280Z + set +e 2025-12-04T08:59:16.2548069Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:16.2548587Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:16.2551345Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:59:16.2552622Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:16.7857971Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:16.7859755Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:16.7860579Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:16.7861040Z 2025-12-04T08:59:16.7861693Z Login Succeeded 2025-12-04T08:59:16.7881221Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:16.7882359Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T08:59:17.0016875Z + IMAGE_SIZE=15091.581844329834 2025-12-04T08:59:17.0017342Z + echo 'Compressed size of image in MB: 15091.581844329834' 2025-12-04T08:59:17.0017783Z + set -e 2025-12-04T08:59:17.0018079Z Compressed size of image in MB: 15091.581844329834 2025-12-04T08:59:17.0019428Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:17.0140605Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:17.0142526Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:17.2161059Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image 2025-12-04T08:59:17.2162439Z 63e5bc7682b8: Pulling fs layer 2025-12-04T08:59:17.2162962Z 0678d56345c9: Pulling fs layer 2025-12-04T08:59:17.2163529Z 45f5c9ddfce7: Pulling fs layer 2025-12-04T08:59:17.2164050Z 086b1df51ac1: Pulling fs layer 2025-12-04T08:59:17.2164566Z fe8a7b64bf98: Pulling fs layer 2025-12-04T08:59:17.2165075Z 7680723e9a57: Pulling fs layer 2025-12-04T08:59:17.2165587Z 9c5027aeeb4e: Pulling fs layer 2025-12-04T08:59:17.2166084Z 9a5652110360: Pulling fs layer 2025-12-04T08:59:17.2166638Z 375c4427e914: Pulling fs layer 2025-12-04T08:59:17.2167122Z a86faaa7dbdd: Pulling fs layer 2025-12-04T08:59:17.2167430Z fb7848686804: Pulling fs layer 2025-12-04T08:59:17.2167770Z 3541df015cdb: Pulling fs layer 2025-12-04T08:59:17.2168086Z 79dc80f426b2: Pulling fs layer 2025-12-04T08:59:17.2168408Z a13fcc1b90bb: Pulling fs layer 2025-12-04T08:59:17.2168715Z 4f4fb700ef54: Pulling fs layer 2025-12-04T08:59:17.2169031Z 549db4d6c618: Pulling fs layer 2025-12-04T08:59:17.2169346Z 5c63528cb580: Pulling fs layer 2025-12-04T08:59:17.2169647Z 75bd83b989a4: Pulling fs layer 2025-12-04T08:59:17.2169964Z de6e78970f51: Pulling fs layer 2025-12-04T08:59:17.2170281Z e13ed7c7e473: Pulling fs layer 2025-12-04T08:59:17.2170644Z 6e2949bcb741: Pulling fs layer 2025-12-04T08:59:17.2170945Z 14d69d9aaec7: Pulling fs layer 2025-12-04T08:59:17.2171257Z 5c02769dd8e5: Pulling fs layer 2025-12-04T08:59:17.2171578Z 35041ce524ac: Pulling fs layer 2025-12-04T08:59:17.2171863Z 7680723e9a57: Waiting 2025-12-04T08:59:17.2172146Z 2fa92dc5885e: Pulling fs layer 2025-12-04T08:59:17.2172449Z 9c5027aeeb4e: Waiting 2025-12-04T08:59:17.2172719Z 2b85eafbd92a: Pulling fs layer 2025-12-04T08:59:17.2173139Z ff755a4ddad7: Pulling fs layer 2025-12-04T08:59:17.2173441Z 9a5652110360: Waiting 2025-12-04T08:59:17.2173890Z 09eb41bdf42d: Pulling fs layer 2025-12-04T08:59:17.2174197Z 375c4427e914: Waiting 2025-12-04T08:59:17.2174482Z 11ede4d59e93: Pulling fs layer 2025-12-04T08:59:17.2174793Z 79dc80f426b2: Waiting 2025-12-04T08:59:17.2175064Z 1283cd8f801a: Pulling fs layer 2025-12-04T08:59:17.2175369Z 14d69d9aaec7: Waiting 2025-12-04T08:59:17.2175654Z 024fa855425f: Pulling fs layer 2025-12-04T08:59:17.2175949Z a13fcc1b90bb: Waiting 2025-12-04T08:59:17.2176225Z 5c02769dd8e5: Waiting 2025-12-04T08:59:17.2176495Z a86faaa7dbdd: Waiting 2025-12-04T08:59:17.2176768Z 303e6747a62e: Pulling fs layer 2025-12-04T08:59:17.2177080Z 4f4fb700ef54: Waiting 2025-12-04T08:59:17.2177356Z 35041ce524ac: Waiting 2025-12-04T08:59:17.2177609Z fb7848686804: Waiting 2025-12-04T08:59:17.2177887Z 3017cdf4838b: Pulling fs layer 2025-12-04T08:59:17.2178198Z 2fa92dc5885e: Waiting 2025-12-04T08:59:17.2178469Z 6b6cd1c358e8: Pulling fs layer 2025-12-04T08:59:17.2179019Z 3541df015cdb: Waiting 2025-12-04T08:59:17.2179303Z b2dd04501124: Pulling fs layer 2025-12-04T08:59:17.2179597Z 549db4d6c618: Waiting 2025-12-04T08:59:17.2179880Z 55adc51fe589: Pulling fs layer 2025-12-04T08:59:17.2180191Z 086b1df51ac1: Waiting 2025-12-04T08:59:17.2180445Z 5c63528cb580: Waiting 2025-12-04T08:59:17.2180715Z e13ed7c7e473: Waiting 2025-12-04T08:59:17.2180984Z de6e78970f51: Waiting 2025-12-04T08:59:17.2181255Z a43ca0e4b837: Pulling fs layer 2025-12-04T08:59:17.2181562Z 75bd83b989a4: Waiting 2025-12-04T08:59:17.2181829Z 6e2949bcb741: Waiting 2025-12-04T08:59:17.2182111Z b7212f17fd14: Pulling fs layer 2025-12-04T08:59:17.2182406Z 2b85eafbd92a: Waiting 2025-12-04T08:59:17.2182679Z ff755a4ddad7: Waiting 2025-12-04T08:59:17.2183161Z 09eb41bdf42d: Waiting 2025-12-04T08:59:17.2183441Z 083e42cac090: Pulling fs layer 2025-12-04T08:59:17.2183764Z 0a00b784a4aa: Pulling fs layer 2025-12-04T08:59:17.2184076Z 11ede4d59e93: Waiting 2025-12-04T08:59:17.2184332Z 303e6747a62e: Waiting 2025-12-04T08:59:17.2184706Z 6b6cd1c358e8: Waiting 2025-12-04T08:59:17.2184992Z c6173c779f7b: Pulling fs layer 2025-12-04T08:59:17.2185290Z b2dd04501124: Waiting 2025-12-04T08:59:17.2185559Z 1283cd8f801a: Waiting 2025-12-04T08:59:17.2185835Z 55adc51fe589: Waiting 2025-12-04T08:59:17.2186094Z 3017cdf4838b: Waiting 2025-12-04T08:59:17.2186381Z ed3d1e3387b9: Pulling fs layer 2025-12-04T08:59:17.2186695Z fe8a7b64bf98: Waiting 2025-12-04T08:59:17.2186951Z 024fa855425f: Waiting 2025-12-04T08:59:17.2187230Z b29343478586: Pulling fs layer 2025-12-04T08:59:17.2187535Z b7212f17fd14: Waiting 2025-12-04T08:59:17.2187792Z a43ca0e4b837: Waiting 2025-12-04T08:59:17.2188060Z c6173c779f7b: Waiting 2025-12-04T08:59:17.2188339Z c6f0520487fb: Pulling fs layer 2025-12-04T08:59:17.2188649Z 083e42cac090: Waiting 2025-12-04T08:59:17.2188903Z 0a00b784a4aa: Waiting 2025-12-04T08:59:17.2189180Z 148171691cd4: Pulling fs layer 2025-12-04T08:59:17.2189486Z ed3d1e3387b9: Waiting 2025-12-04T08:59:17.2189740Z c6f0520487fb: Waiting 2025-12-04T08:59:17.2190029Z 2c666d30ed77: Pulling fs layer 2025-12-04T08:59:17.2190448Z 148171691cd4: Waiting 2025-12-04T08:59:17.2190709Z 5d8d3a0a98e0: Pulling fs layer 2025-12-04T08:59:17.2191009Z b29343478586: Waiting 2025-12-04T08:59:17.2191271Z 2c666d30ed77: Waiting 2025-12-04T08:59:17.2191538Z b06bafce9e81: Pulling fs layer 2025-12-04T08:59:17.2191858Z 15e0d7e4590d: Pulling fs layer 2025-12-04T08:59:17.2192158Z 5d8d3a0a98e0: Waiting 2025-12-04T08:59:17.2192423Z a514bd1add31: Pulling fs layer 2025-12-04T08:59:17.2192726Z b06bafce9e81: Waiting 2025-12-04T08:59:17.2192987Z 15e0d7e4590d: Waiting 2025-12-04T08:59:17.2193246Z 57b84ee60002: Pulling fs layer 2025-12-04T08:59:17.2193542Z 57b84ee60002: Waiting 2025-12-04T08:59:17.2193818Z b8babeff6d81: Pulling fs layer 2025-12-04T08:59:17.2194125Z 83779ddf6a85: Pulling fs layer 2025-12-04T08:59:17.2194433Z 8b7620c0d736: Pulling fs layer 2025-12-04T08:59:17.2194736Z b8babeff6d81: Waiting 2025-12-04T08:59:17.2195019Z 3bcfa090e4ef: Pulling fs layer 2025-12-04T08:59:17.2195327Z eb0504ec4d92: Pulling fs layer 2025-12-04T08:59:17.2195631Z 3bcfa090e4ef: Waiting 2025-12-04T08:59:17.2195900Z 83779ddf6a85: Waiting 2025-12-04T08:59:17.2196167Z 15d0fec09d7b: Pulling fs layer 2025-12-04T08:59:17.2196472Z eb0504ec4d92: Waiting 2025-12-04T08:59:17.2196755Z cca81fcc62a9: Pulling fs layer 2025-12-04T08:59:17.2197050Z 15d0fec09d7b: Waiting 2025-12-04T08:59:17.2197335Z b0b8f9b5c6ab: Pulling fs layer 2025-12-04T08:59:17.2197653Z 0606ca4d47a8: Pulling fs layer 2025-12-04T08:59:17.2197954Z 2f80a4e1b3b9: Pulling fs layer 2025-12-04T08:59:17.2198271Z 35c916fb1bd0: Pulling fs layer 2025-12-04T08:59:17.2198586Z 195537b7dafc: Pulling fs layer 2025-12-04T08:59:17.2198874Z 0606ca4d47a8: Waiting 2025-12-04T08:59:17.2199147Z cca81fcc62a9: Waiting 2025-12-04T08:59:17.2199435Z dc454fd3967e: Pulling fs layer 2025-12-04T08:59:17.2199723Z 2f80a4e1b3b9: Waiting 2025-12-04T08:59:17.2200000Z 701b34f115fa: Pulling fs layer 2025-12-04T08:59:17.2200298Z 195537b7dafc: Waiting 2025-12-04T08:59:17.2200575Z 35c916fb1bd0: Waiting 2025-12-04T08:59:17.2200826Z dc454fd3967e: Waiting 2025-12-04T08:59:17.2201105Z 39cefc00ffed: Pulling fs layer 2025-12-04T08:59:17.2201420Z 6ae51eb61a32: Pulling fs layer 2025-12-04T08:59:17.2201717Z 1fd5341e66df: Pulling fs layer 2025-12-04T08:59:17.2202026Z 72a7c87e35e4: Pulling fs layer 2025-12-04T08:59:17.2202332Z ec36862ac98e: Pulling fs layer 2025-12-04T08:59:17.2202632Z 05ddbf246e8a: Pulling fs layer 2025-12-04T08:59:17.2202934Z 1fd5341e66df: Waiting 2025-12-04T08:59:17.2203195Z 72a7c87e35e4: Waiting 2025-12-04T08:59:17.2203443Z 701b34f115fa: Waiting 2025-12-04T08:59:17.2203705Z 05ddbf246e8a: Waiting 2025-12-04T08:59:17.2203969Z ec36862ac98e: Waiting 2025-12-04T08:59:17.2204221Z 39cefc00ffed: Waiting 2025-12-04T08:59:17.3225113Z 0678d56345c9: Verifying Checksum 2025-12-04T08:59:17.3225635Z 0678d56345c9: Download complete 2025-12-04T08:59:17.4142373Z 086b1df51ac1: Verifying Checksum 2025-12-04T08:59:17.4142774Z 086b1df51ac1: Download complete 2025-12-04T08:59:17.4883772Z fe8a7b64bf98: Download complete 2025-12-04T08:59:17.5616585Z 63e5bc7682b8: Verifying Checksum 2025-12-04T08:59:17.5616960Z 63e5bc7682b8: Download complete 2025-12-04T08:59:17.5648656Z 7680723e9a57: Verifying Checksum 2025-12-04T08:59:17.5649735Z 7680723e9a57: Download complete 2025-12-04T08:59:17.6348094Z 9c5027aeeb4e: Verifying Checksum 2025-12-04T08:59:17.6348690Z 9c5027aeeb4e: Download complete 2025-12-04T08:59:17.6381202Z 9a5652110360: Verifying Checksum 2025-12-04T08:59:17.7038076Z 9a5652110360: Download complete 2025-12-04T08:59:17.7038701Z a86faaa7dbdd: Download complete 2025-12-04T08:59:17.7659410Z fb7848686804: Verifying Checksum 2025-12-04T08:59:17.7660015Z fb7848686804: Download complete 2025-12-04T08:59:17.8552242Z 3541df015cdb: Verifying Checksum 2025-12-04T08:59:17.8552701Z 3541df015cdb: Download complete 2025-12-04T08:59:17.9147869Z 79dc80f426b2: Verifying Checksum 2025-12-04T08:59:17.9148291Z 79dc80f426b2: Download complete 2025-12-04T08:59:18.3708004Z 63e5bc7682b8: Pull complete 2025-12-04T08:59:18.3930799Z 0678d56345c9: Pull complete 2025-12-04T08:59:18.7819565Z 375c4427e914: Verifying Checksum 2025-12-04T08:59:18.7819975Z 375c4427e914: Download complete 2025-12-04T08:59:18.7900667Z 4f4fb700ef54: Verifying Checksum 2025-12-04T08:59:18.7901049Z 4f4fb700ef54: Download complete 2025-12-04T08:59:18.8613209Z 549db4d6c618: Download complete 2025-12-04T08:59:18.9503537Z 5c63528cb580: Download complete 2025-12-04T08:59:19.0142696Z 75bd83b989a4: Verifying Checksum 2025-12-04T08:59:19.0143097Z 75bd83b989a4: Download complete 2025-12-04T08:59:19.1215278Z de6e78970f51: Verifying Checksum 2025-12-04T08:59:19.1215688Z de6e78970f51: Download complete 2025-12-04T08:59:19.1868875Z e13ed7c7e473: Download complete 2025-12-04T08:59:19.2885111Z 6e2949bcb741: Download complete 2025-12-04T08:59:19.3592562Z 14d69d9aaec7: Verifying Checksum 2025-12-04T08:59:19.3592978Z 14d69d9aaec7: Download complete 2025-12-04T08:59:19.4433118Z 5c02769dd8e5: Verifying Checksum 2025-12-04T08:59:19.4433520Z 5c02769dd8e5: Download complete 2025-12-04T08:59:20.3893532Z 45f5c9ddfce7: Verifying Checksum 2025-12-04T08:59:20.3894318Z 45f5c9ddfce7: Download complete 2025-12-04T08:59:20.7093384Z 2fa92dc5885e: Download complete 2025-12-04T08:59:21.2670706Z 2b85eafbd92a: Verifying Checksum 2025-12-04T08:59:21.3295766Z ff755a4ddad7: Download complete 2025-12-04T08:59:21.4056900Z 09eb41bdf42d: Verifying Checksum 2025-12-04T08:59:21.4057317Z 09eb41bdf42d: Download complete 2025-12-04T08:59:26.0428145Z 11ede4d59e93: Verifying Checksum 2025-12-04T08:59:26.0428618Z 11ede4d59e93: Download complete 2025-12-04T08:59:26.1102712Z 1283cd8f801a: Verifying Checksum 2025-12-04T08:59:26.1103119Z 1283cd8f801a: Download complete 2025-12-04T08:59:26.2084566Z 024fa855425f: Download complete 2025-12-04T08:59:26.2697873Z 303e6747a62e: Download complete 2025-12-04T08:59:26.3562638Z 3017cdf4838b: Verifying Checksum 2025-12-04T08:59:26.3563063Z 3017cdf4838b: Download complete 2025-12-04T08:59:26.5960207Z 6b6cd1c358e8: Verifying Checksum 2025-12-04T08:59:26.5960609Z 6b6cd1c358e8: Download complete 2025-12-04T08:59:26.6544845Z b2dd04501124: Verifying Checksum 2025-12-04T08:59:26.6545269Z b2dd04501124: Download complete 2025-12-04T08:59:26.7248931Z 55adc51fe589: Download complete 2025-12-04T08:59:26.8027258Z a43ca0e4b837: Verifying Checksum 2025-12-04T08:59:26.8027653Z a43ca0e4b837: Download complete 2025-12-04T08:59:26.8748395Z b7212f17fd14: Verifying Checksum 2025-12-04T08:59:26.8748836Z b7212f17fd14: Download complete 2025-12-04T08:59:26.9643067Z 083e42cac090: Download complete 2025-12-04T08:59:27.0633664Z 0a00b784a4aa: Verifying Checksum 2025-12-04T08:59:27.0634075Z 0a00b784a4aa: Download complete 2025-12-04T08:59:27.1368323Z c6173c779f7b: Verifying Checksum 2025-12-04T08:59:27.1368732Z c6173c779f7b: Download complete 2025-12-04T08:59:27.3841133Z 45f5c9ddfce7: Pull complete 2025-12-04T08:59:27.4054952Z 086b1df51ac1: Pull complete 2025-12-04T08:59:27.4459424Z fe8a7b64bf98: Pull complete 2025-12-04T08:59:27.4955246Z 7680723e9a57: Pull complete 2025-12-04T08:59:27.5260788Z 9c5027aeeb4e: Pull complete 2025-12-04T08:59:27.5497615Z 9a5652110360: Pull complete 2025-12-04T08:59:28.6385670Z ed3d1e3387b9: Verifying Checksum 2025-12-04T08:59:28.6386082Z ed3d1e3387b9: Download complete 2025-12-04T08:59:28.7106413Z b29343478586: Verifying Checksum 2025-12-04T08:59:28.7106925Z b29343478586: Download complete 2025-12-04T08:59:30.0300334Z 375c4427e914: Pull complete 2025-12-04T08:59:30.5439212Z a86faaa7dbdd: Pull complete 2025-12-04T08:59:30.8902736Z fb7848686804: Pull complete 2025-12-04T08:59:31.1429612Z 3541df015cdb: Pull complete 2025-12-04T08:59:31.4244239Z 79dc80f426b2: Pull complete 2025-12-04T08:59:31.8804419Z c6f0520487fb: Verifying Checksum 2025-12-04T08:59:31.8804850Z c6f0520487fb: Download complete 2025-12-04T08:59:49.8212047Z a13fcc1b90bb: Verifying Checksum 2025-12-04T08:59:49.8212524Z a13fcc1b90bb: Download complete 2025-12-04T08:59:49.9291537Z 2c666d30ed77: Verifying Checksum 2025-12-04T08:59:49.9291971Z 2c666d30ed77: Download complete 2025-12-04T08:59:49.9841325Z 5d8d3a0a98e0: Download complete 2025-12-04T08:59:50.0576310Z b06bafce9e81: Verifying Checksum 2025-12-04T08:59:50.0576742Z b06bafce9e81: Download complete 2025-12-04T08:59:50.1320230Z 15e0d7e4590d: Verifying Checksum 2025-12-04T08:59:50.1320619Z 15e0d7e4590d: Download complete 2025-12-04T08:59:50.2326976Z a514bd1add31: Verifying Checksum 2025-12-04T08:59:50.2327379Z a514bd1add31: Download complete 2025-12-04T08:59:50.3070580Z 57b84ee60002: Download complete 2025-12-04T08:59:50.3748411Z b8babeff6d81: Download complete 2025-12-04T08:59:50.4404944Z 83779ddf6a85: Verifying Checksum 2025-12-04T08:59:50.4405321Z 83779ddf6a85: Download complete 2025-12-04T08:59:50.5254951Z 8b7620c0d736: Download complete 2025-12-04T08:59:50.6134010Z 3bcfa090e4ef: Verifying Checksum 2025-12-04T08:59:50.6134460Z 3bcfa090e4ef: Download complete 2025-12-04T08:59:50.6681151Z eb0504ec4d92: Download complete 2025-12-04T08:59:50.7421809Z 15d0fec09d7b: Verifying Checksum 2025-12-04T08:59:50.7422481Z 15d0fec09d7b: Download complete 2025-12-04T08:59:50.8158377Z cca81fcc62a9: Verifying Checksum 2025-12-04T08:59:50.8158898Z cca81fcc62a9: Download complete 2025-12-04T08:59:50.9187759Z b0b8f9b5c6ab: Verifying Checksum 2025-12-04T08:59:50.9188236Z b0b8f9b5c6ab: Download complete 2025-12-04T08:59:50.9935264Z 0606ca4d47a8: Verifying Checksum 2025-12-04T08:59:50.9935673Z 0606ca4d47a8: Download complete 2025-12-04T08:59:51.0920809Z 2f80a4e1b3b9: Verifying Checksum 2025-12-04T08:59:51.0921448Z 2f80a4e1b3b9: Download complete 2025-12-04T08:59:51.1545562Z 35c916fb1bd0: Download complete 2025-12-04T08:59:53.1800380Z 195537b7dafc: Verifying Checksum 2025-12-04T08:59:53.1800801Z 195537b7dafc: Download complete 2025-12-04T08:59:53.2571266Z dc454fd3967e: Download complete 2025-12-04T08:59:53.3257190Z 701b34f115fa: Verifying Checksum 2025-12-04T08:59:53.3257638Z 701b34f115fa: Download complete 2025-12-04T08:59:53.3933018Z 39cefc00ffed: Download complete 2025-12-04T08:59:53.4758520Z 6ae51eb61a32: Download complete 2025-12-04T08:59:53.5665041Z 1fd5341e66df: Verifying Checksum 2025-12-04T08:59:53.8053759Z 1fd5341e66df: Download complete 2025-12-04T08:59:53.8054362Z 72a7c87e35e4: Verifying Checksum 2025-12-04T08:59:53.8055138Z 72a7c87e35e4: Download complete 2025-12-04T08:59:53.8922394Z ec36862ac98e: Verifying Checksum 2025-12-04T08:59:53.8922828Z ec36862ac98e: Download complete 2025-12-04T08:59:54.4960457Z 05ddbf246e8a: Verifying Checksum 2025-12-04T08:59:54.4960893Z 05ddbf246e8a: Download complete 2025-12-04T09:00:02.5059033Z 148171691cd4: Verifying Checksum 2025-12-04T09:00:02.5059727Z 148171691cd4: Download complete 2025-12-04T09:00:35.5384792Z a13fcc1b90bb: Pull complete 2025-12-04T09:00:35.9863547Z 4f4fb700ef54: Pull complete 2025-12-04T09:00:36.2983197Z 549db4d6c618: Pull complete 2025-12-04T09:00:36.4914617Z 5c63528cb580: Pull complete 2025-12-04T09:00:36.8356372Z 75bd83b989a4: Pull complete 2025-12-04T09:00:37.2998509Z de6e78970f51: Pull complete 2025-12-04T09:00:37.6884131Z e13ed7c7e473: Pull complete 2025-12-04T09:00:38.0986055Z 6e2949bcb741: Pull complete 2025-12-04T09:00:38.3851652Z 35041ce524ac: Verifying Checksum 2025-12-04T09:00:38.3852992Z 35041ce524ac: Download complete 2025-12-04T09:00:38.4540469Z 14d69d9aaec7: Pull complete 2025-12-04T09:00:38.8679096Z 5c02769dd8e5: Pull complete 2025-12-04T09:01:51.2229086Z 35041ce524ac: Pull complete 2025-12-04T09:01:51.5878310Z 2fa92dc5885e: Pull complete 2025-12-04T09:01:52.4978538Z 2b85eafbd92a: Pull complete 2025-12-04T09:01:52.9375687Z ff755a4ddad7: Pull complete 2025-12-04T09:01:53.2916485Z 09eb41bdf42d: Pull complete 2025-12-04T09:02:01.0234503Z 11ede4d59e93: Pull complete 2025-12-04T09:02:01.4628496Z 1283cd8f801a: Pull complete 2025-12-04T09:02:01.8095620Z 024fa855425f: Pull complete 2025-12-04T09:02:02.0892901Z 303e6747a62e: Pull complete 2025-12-04T09:02:02.1137146Z 3017cdf4838b: Pull complete 2025-12-04T09:02:02.3223637Z 6b6cd1c358e8: Pull complete 2025-12-04T09:02:02.3468484Z b2dd04501124: Pull complete 2025-12-04T09:02:02.3716047Z 55adc51fe589: Pull complete 2025-12-04T09:02:02.4152326Z a43ca0e4b837: Pull complete 2025-12-04T09:02:02.4385288Z b7212f17fd14: Pull complete 2025-12-04T09:02:02.4616106Z 083e42cac090: Pull complete 2025-12-04T09:02:02.5071343Z 0a00b784a4aa: Pull complete 2025-12-04T09:02:02.5315597Z c6173c779f7b: Pull complete 2025-12-04T09:02:05.4843952Z ed3d1e3387b9: Pull complete 2025-12-04T09:02:05.4999230Z b29343478586: Pull complete 2025-12-04T09:02:06.9002040Z c6f0520487fb: Pull complete 2025-12-04T09:02:58.8804763Z 148171691cd4: Pull complete 2025-12-04T09:02:59.2536320Z 2c666d30ed77: Pull complete 2025-12-04T09:02:59.6113259Z 5d8d3a0a98e0: Pull complete 2025-12-04T09:03:00.4454965Z b06bafce9e81: Pull complete 2025-12-04T09:03:01.3601678Z 15e0d7e4590d: Pull complete 2025-12-04T09:03:01.8030070Z a514bd1add31: Pull complete 2025-12-04T09:03:02.7256125Z 57b84ee60002: Pull complete 2025-12-04T09:03:03.4368472Z b8babeff6d81: Pull complete 2025-12-04T09:03:03.7937941Z 83779ddf6a85: Pull complete 2025-12-04T09:03:04.7005317Z 8b7620c0d736: Pull complete 2025-12-04T09:03:05.6324494Z 3bcfa090e4ef: Pull complete 2025-12-04T09:03:05.9954606Z eb0504ec4d92: Pull complete 2025-12-04T09:03:06.5724593Z 15d0fec09d7b: Pull complete 2025-12-04T09:03:06.8719726Z cca81fcc62a9: Pull complete 2025-12-04T09:03:07.4723582Z b0b8f9b5c6ab: Pull complete 2025-12-04T09:03:07.6742589Z 0606ca4d47a8: Pull complete 2025-12-04T09:03:08.4352451Z 2f80a4e1b3b9: Pull complete 2025-12-04T09:03:08.8595992Z 35c916fb1bd0: Pull complete 2025-12-04T09:03:15.0972006Z 195537b7dafc: Pull complete 2025-12-04T09:03:15.5353010Z dc454fd3967e: Pull complete 2025-12-04T09:03:15.9765090Z 701b34f115fa: Pull complete 2025-12-04T09:03:16.3893652Z 39cefc00ffed: Pull complete 2025-12-04T09:03:16.8271136Z 6ae51eb61a32: Pull complete 2025-12-04T09:03:17.2809240Z 1fd5341e66df: Pull complete 2025-12-04T09:03:18.8379183Z 72a7c87e35e4: Pull complete 2025-12-04T09:03:19.1691907Z ec36862ac98e: Pull complete 2025-12-04T09:03:21.0200118Z 05ddbf246e8a: Pull complete 2025-12-04T09:03:21.7397704Z Digest: sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T09:03:21.8341963Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:03:21.8828822Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:03:21.8885797Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:21.8886806Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:21.8896098Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:21.8896545Z env: 2025-12-04T09:03:21.8896784Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:21.8897088Z ##[endgroup] 2025-12-04T09:03:21.9086671Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2025-12-04T09:03:21.9087312Z with: 2025-12-04T09:03:21.9087554Z driver-version: 580.82.07 2025-12-04T09:03:21.9087847Z env: 2025-12-04T09:03:21.9088088Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:21.9088371Z ##[endgroup] 2025-12-04T09:03:21.9151657Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:21.9152608Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:21.9158357Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:21.9158749Z env: 2025-12-04T09:03:21.9158975Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:21.9159232Z ##[endgroup] 2025-12-04T09:03:21.9325348Z ##[group]Run set -euo pipefail 2025-12-04T09:03:21.9325713Z set -euo pipefail 2025-12-04T09:03:21.9326045Z  2025-12-04T09:03:21.9326275Z has_gpu=false 2025-12-04T09:03:21.9326566Z devices="" 2025-12-04T09:03:21.9326830Z  2025-12-04T09:03:21.9327130Z if command -v nvidia-smi >/dev/null 2>&1; then 2025-12-04T09:03:21.9327641Z  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:21.9328185Z  has_gpu=true 2025-12-04T09:03:21.9328510Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:21.9328849Z  fi 2025-12-04T09:03:21.9329081Z fi 2025-12-04T09:03:21.9329308Z  2025-12-04T09:03:21.9329539Z if [ "$has_gpu" = false ]; then 2025-12-04T09:03:21.9329966Z  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:21.9367919Z  has_gpu=true 2025-12-04T09:03:21.9368418Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:21.9368883Z  fi 2025-12-04T09:03:21.9369178Z fi 2025-12-04T09:03:21.9369478Z  2025-12-04T09:03:21.9369899Z if [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then 2025-12-04T09:03:21.9370504Z  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:21.9370936Z  has_gpu=true 2025-12-04T09:03:21.9371223Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:21.9371531Z  fi 2025-12-04T09:03:21.9371727Z fi 2025-12-04T09:03:21.9371913Z  2025-12-04T09:03:21.9372199Z printf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:21.9372736Z printf 'DETECTED_DEVICES<> "$GITHUB_OUTPUT" 2025-12-04T09:03:21.9379178Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:21.9379612Z env: 2025-12-04T09:03:21.9379837Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:21.9380118Z ##[endgroup] 2025-12-04T09:03:25.0741840Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:03:25.0742371Z if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:03:25.0742815Z  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}" 2025-12-04T09:03:25.0743425Z  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-12-04T09:03:25.0743970Z else 2025-12-04T09:03:25.0744284Z  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}" 2025-12-04T09:03:25.0744681Z fi 2025-12-04T09:03:25.0752239Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:25.0752650Z env: 2025-12-04T09:03:25.0752887Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:25.0753172Z HAS_NVIDIA: true 2025-12-04T09:03:25.0753421Z ##[endgroup] 2025-12-04T09:03:25.0958061Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2025-12-04T09:03:25.0958498Z with: 2025-12-04T09:03:25.0958706Z timeout_minutes: 10 2025-12-04T09:03:25.0958960Z max_attempts: 3 2025-12-04T09:03:25.0990470Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo" else # Amazon Linux 2 YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" fi sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y \ nvidia-container-toolkit-1.17.8 \ libnvidia-container-tools-1.17.8 \ libnvidia-container1-1.17.8 \ nvidia-container-toolkit-base-1.17.8 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x # Install nvidia-driver package if not installed status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)" if [ ! $? = 0 ] || [ ! "$status" = installed ]; then sudo apt-get install -y nvidia-container-toolkit-1.17.8 sudo systemctl restart docker fi ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" # Turn off persistent mode so that the installation script can unload the kernel module sudo killall nvidia-persistenced || true else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in # the case where the driver has already crashed as it still can get the driver version # and some basic information like the bus ID. However, the rest of the information # would be missing (ERR!), for example: # # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # | | | MIG M. | # |===============================+======================+======================| # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | # | | | ERR! | # +-------------------------------+----------------------+----------------------+ # # +-----------------------------------------------------------------------------+ # | Processes: | # | GPU GI CI PID Type Process name GPU Memory | # | ID ID Usage | # |=============================================================================| # +-----------------------------------------------------------------------------+ # # This should be reported as a failure instead as it will guarantee to fail when # Docker tries to run with --gpus all # # So, the correct check here is to query one of the missing piece of info like # GPU name, so that the command can fail accordingly nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with # more than one GPUs. This just needs to be run once. The command fails # on subsequent runs and complains that the mode is already on, but that's # ok sudo nvidia-persistenced || true # This should show persistence mode ON nvidia-smi # check if the container-toolkit is correctly installed and CUDA is available inside a container docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi 2025-12-04T09:03:25.1020667Z retry_wait_seconds: 10 2025-12-04T09:03:25.1020988Z polling_interval_seconds: 1 2025-12-04T09:03:25.1021300Z warning_on_retry: true 2025-12-04T09:03:25.1021606Z continue_on_error: false 2025-12-04T09:03:25.1021900Z env: 2025-12-04T09:03:25.1022128Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:25.1022429Z HAS_NVIDIA_GPU: true 2025-12-04T09:03:25.1022790Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:03:25.1023203Z DRIVER_VERSION: 580.82.07 2025-12-04T09:03:25.1023502Z ##[endgroup] 2025-12-04T09:03:25.2219738Z == Installing nvidia driver NVIDIA-Linux-x86_64-580.82.07.run == 2025-12-04T09:03:25.2220826Z + pre_install_nvidia_driver_amzn2 2025-12-04T09:03:25.2221756Z + sudo yum remove -y nvidia-driver-latest-dkms 2025-12-04T09:03:25.8503169Z No match for argument: nvidia-driver-latest-dkms 2025-12-04T09:03:25.8503684Z No packages marked for removal. 2025-12-04T09:03:25.8571475Z Dependencies resolved. 2025-12-04T09:03:25.8580155Z Nothing to do. 2025-12-04T09:03:25.8580894Z Complete! 2025-12-04T09:03:25.9887908Z + install_nvidia_driver_common 2025-12-04T09:03:25.9890245Z + echo 'Before installing NVIDIA driver' 2025-12-04T09:03:25.9891779Z + lspci 2025-12-04T09:03:25.9893700Z Before installing NVIDIA driver 2025-12-04T09:03:26.1214370Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:03:26.1215066Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:03:26.1215754Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:03:26.1216418Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:03:26.1217017Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:03:26.1217685Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:03:26.1218294Z 00:1b.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:26.1218871Z 00:1c.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:26.1219447Z 00:1d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:26.1220269Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:26.1221027Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:03:26.1221526Z + lsmod 2025-12-04T09:03:26.1253506Z Module Size Used by 2025-12-04T09:03:26.1254092Z nvidia_uvm 1925120 0 2025-12-04T09:03:26.1254504Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:03:26.1254868Z drm 602112 1 nvidia 2025-12-04T09:03:26.1255252Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:03:26.1255621Z backlight 24576 1 drm 2025-12-04T09:03:26.1255966Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:03:26.1256323Z xt_conntrack 16384 1 2025-12-04T09:03:26.1256629Z nft_chain_nat 16384 3 2025-12-04T09:03:26.1256942Z xt_MASQUERADE 20480 1 2025-12-04T09:03:26.1257306Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:03:26.1257714Z nf_conntrack_netlink 57344 0 2025-12-04T09:03:26.1258212Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:03:26.1258771Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:03:26.1259164Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:03:26.1259516Z xfrm_user 57344 1 2025-12-04T09:03:26.1259948Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:03:26.1260404Z xt_addrtype 16384 2 2025-12-04T09:03:26.1260688Z nft_compat 20480 4 2025-12-04T09:03:26.1261043Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:03:26.1261529Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:03:26.1261954Z br_netfilter 36864 0 2025-12-04T09:03:26.1262276Z bridge 323584 1 br_netfilter 2025-12-04T09:03:26.1262617Z stp 16384 1 bridge 2025-12-04T09:03:26.1262932Z llc 16384 2 bridge,stp 2025-12-04T09:03:26.1263267Z overlay 167936 0 2025-12-04T09:03:26.1263556Z tls 139264 0 2025-12-04T09:03:26.1263847Z nls_ascii 16384 1 2025-12-04T09:03:26.1264126Z nls_cp437 20480 1 2025-12-04T09:03:26.1264413Z sunrpc 700416 1 2025-12-04T09:03:26.1264697Z vfat 24576 1 2025-12-04T09:03:26.1264972Z fat 86016 1 vfat 2025-12-04T09:03:26.1265293Z ena 184320 0 2025-12-04T09:03:26.1265562Z i8042 45056 0 2025-12-04T09:03:26.1265856Z serio 28672 3 i8042 2025-12-04T09:03:26.1266180Z skx_edac_common 28672 0 2025-12-04T09:03:26.1266466Z button 24576 0 2025-12-04T09:03:26.1266765Z ghash_clmulni_intel 16384 0 2025-12-04T09:03:26.1267072Z sch_fq_codel 20480 33 2025-12-04T09:03:26.1267360Z dm_mod 188416 0 2025-12-04T09:03:26.1267652Z fuse 184320 1 2025-12-04T09:03:26.1267942Z loop 36864 0 2025-12-04T09:03:26.1268238Z configfs 57344 1 2025-12-04T09:03:26.1268519Z dmi_sysfs 20480 0 2025-12-04T09:03:26.1268822Z crc32_pclmul 16384 0 2025-12-04T09:03:26.1269126Z crc32c_intel 24576 0 2025-12-04T09:03:26.1269573Z efivarfs 24576 1 2025-12-04T09:03:26.1269873Z + modinfo nvidia 2025-12-04T09:03:26.1270570Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:03:26.1271204Z import_ns: DMA_BUF 2025-12-04T09:03:26.1271488Z alias: char-major-195-* 2025-12-04T09:03:26.1271806Z version: 580.82.07 2025-12-04T09:03:26.1272099Z supported: external 2025-12-04T09:03:26.1272381Z license: Dual MIT/GPL 2025-12-04T09:03:26.1272721Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:03:26.1273123Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:03:26.1273509Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:03:26.1274027Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:03:26.1274451Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:03:26.1274939Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:03:26.1275344Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:03:26.1275753Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:03:26.1276157Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:03:26.1276542Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:03:26.1276915Z depends: i2c-core,drm 2025-12-04T09:03:26.1277215Z retpoline: Y 2025-12-04T09:03:26.1277461Z name: nvidia 2025-12-04T09:03:26.1277892Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:03:26.1278465Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:03:26.1279431Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:03:26.1280002Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:03:26.1280391Z parm: NVreg_RmLogonRC:int 2025-12-04T09:03:26.1280759Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:03:26.1281135Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:03:26.1281504Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:03:26.1281876Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:03:26.1282304Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:03:26.1282778Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:03:26.1283183Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:03:26.1283558Z parm: NVreg_EnableMSI:int 2025-12-04T09:03:26.1283918Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:03:26.1284358Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:03:26.1284842Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:03:26.1285402Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:03:26.1285893Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:26.1286380Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:03:26.1286868Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:26.1287454Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:03:26.1287841Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:03:26.1288268Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:03:26.1288683Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:03:26.1289253Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:03:26.1289637Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:03:26.1290017Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:03:26.1290402Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:03:26.1290773Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:03:26.1291169Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:03:26.1291601Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:03:26.1292022Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:03:26.1292455Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:03:26.1292840Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:03:26.1293310Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:03:26.1293912Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:03:26.1294315Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:03:26.1294732Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:03:26.1295136Z parm: NVreg_RmMsg:charp 2025-12-04T09:03:26.1295473Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:03:26.1295872Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:03:26.1296271Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:03:26.1296647Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:03:26.1297054Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:03:26.1297638Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:03:26.1298078Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:03:26.1298551Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:03:26.1298981Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:03:26.1299403Z parm: rm_firmware_active:charp 2025-12-04T09:03:26.1299760Z + HAS_NVIDIA_DRIVER=0 2025-12-04T09:03:26.1300052Z ++ command -v nvidia-smi 2025-12-04T09:03:26.1300364Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:03:26.1300661Z + set +e 2025-12-04T09:03:26.1301037Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:03:29.2240718Z + INSTALLED_DRIVER_VERSION=580.82.07 2025-12-04T09:03:29.2241166Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:03:29.2241774Z + '[' 0 -ne 0 ']' 2025-12-04T09:03:29.2242060Z + '[' 580.82.07 '!=' 580.82.07 ']' 2025-12-04T09:03:29.2242379Z + HAS_NVIDIA_DRIVER=1 2025-12-04T09:03:29.2242947Z + echo 'NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation' 2025-12-04T09:03:29.2243540Z + set -e 2025-12-04T09:03:29.2243778Z + '[' 1 -eq 0 ']' 2025-12-04T09:03:29.2244261Z NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation 2025-12-04T09:03:29.2244852Z + post_install_nvidia_driver_common 2025-12-04T09:03:29.2246250Z + sudo modprobe nvidia 2025-12-04T09:03:29.4055151Z + echo 'After installing NVIDIA driver' 2025-12-04T09:03:29.4055544Z + lspci 2025-12-04T09:03:29.4055809Z After installing NVIDIA driver 2025-12-04T09:03:29.4176388Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:03:29.4177031Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:03:29.4177721Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:03:29.4178405Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:03:29.4179246Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:03:29.4179921Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:03:29.4180550Z 00:1b.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:29.4181134Z 00:1c.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:29.4181712Z 00:1d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:29.4182275Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:29.4182893Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:03:29.4183393Z + lsmod 2025-12-04T09:03:29.4204718Z Module Size Used by 2025-12-04T09:03:29.4205180Z nvidia_uvm 1925120 0 2025-12-04T09:03:29.4205555Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:03:29.4205929Z drm 602112 1 nvidia 2025-12-04T09:03:29.4206300Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:03:29.4206677Z backlight 24576 1 drm 2025-12-04T09:03:29.4207023Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:03:29.4207390Z xt_conntrack 16384 1 2025-12-04T09:03:29.4207692Z nft_chain_nat 16384 3 2025-12-04T09:03:29.4208006Z xt_MASQUERADE 20480 1 2025-12-04T09:03:29.4208370Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:03:29.4208761Z nf_conntrack_netlink 57344 0 2025-12-04T09:03:29.4209241Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:03:29.4209780Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:03:29.4210163Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:03:29.4210503Z xfrm_user 57344 1 2025-12-04T09:03:29.4210821Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:03:29.4211172Z xt_addrtype 16384 2 2025-12-04T09:03:29.4211469Z nft_compat 20480 4 2025-12-04T09:03:29.4212054Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:03:29.4212570Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:03:29.4213136Z br_netfilter 36864 0 2025-12-04T09:03:29.4213597Z bridge 323584 1 br_netfilter 2025-12-04T09:03:29.4214143Z stp 16384 1 bridge 2025-12-04T09:03:29.4214487Z llc 16384 2 bridge,stp 2025-12-04T09:03:29.4214845Z overlay 167936 0 2025-12-04T09:03:29.4215162Z tls 139264 0 2025-12-04T09:03:29.4215472Z nls_ascii 16384 1 2025-12-04T09:03:29.4215766Z nls_cp437 20480 1 2025-12-04T09:03:29.4216073Z sunrpc 700416 1 2025-12-04T09:03:29.4216378Z vfat 24576 1 2025-12-04T09:03:29.4216671Z fat 86016 1 vfat 2025-12-04T09:03:29.4216996Z ena 184320 0 2025-12-04T09:03:29.4217295Z i8042 45056 0 2025-12-04T09:03:29.4217592Z serio 28672 3 i8042 2025-12-04T09:03:29.4217937Z skx_edac_common 28672 0 2025-12-04T09:03:29.4218248Z button 24576 0 2025-12-04T09:03:29.4218554Z ghash_clmulni_intel 16384 0 2025-12-04T09:03:29.4218881Z sch_fq_codel 20480 33 2025-12-04T09:03:29.4219208Z dm_mod 188416 0 2025-12-04T09:03:29.4219502Z fuse 184320 1 2025-12-04T09:03:29.4219804Z loop 36864 0 2025-12-04T09:03:29.4220112Z configfs 57344 1 2025-12-04T09:03:29.4220423Z dmi_sysfs 20480 0 2025-12-04T09:03:29.4220720Z crc32_pclmul 16384 0 2025-12-04T09:03:29.4221033Z crc32c_intel 24576 0 2025-12-04T09:03:29.4221342Z efivarfs 24576 1 2025-12-04T09:03:29.4221636Z + modinfo nvidia 2025-12-04T09:03:29.4222152Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:03:29.4222714Z import_ns: DMA_BUF 2025-12-04T09:03:29.4223021Z alias: char-major-195-* 2025-12-04T09:03:29.4223343Z version: 580.82.07 2025-12-04T09:03:29.4223647Z supported: external 2025-12-04T09:03:29.4223964Z license: Dual MIT/GPL 2025-12-04T09:03:29.4224303Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:03:29.4224722Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:03:29.4225122Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:03:29.4225518Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:03:29.4226061Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:03:29.4226480Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:03:29.4226903Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:03:29.4227305Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:03:29.4227709Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:03:29.4228113Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:03:29.4228475Z depends: i2c-core,drm 2025-12-04T09:03:29.4228778Z retpoline: Y 2025-12-04T09:03:29.4229057Z name: nvidia 2025-12-04T09:03:29.4229498Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:03:29.4230083Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:03:29.4230617Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:03:29.4231130Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:03:29.4231506Z parm: NVreg_RmLogonRC:int 2025-12-04T09:03:29.4231858Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:03:29.4232240Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:03:29.4232608Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:03:29.4232964Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:03:29.4233400Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:03:29.4233870Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:03:29.4234278Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:03:29.4234739Z parm: NVreg_EnableMSI:int 2025-12-04T09:03:29.4235112Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:03:29.4235618Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:03:29.4236098Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:03:29.4236557Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:03:29.4237045Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:29.4237522Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:03:29.4238031Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:29.4238528Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:03:29.4238930Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:03:29.4239356Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:03:29.4239799Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:03:29.4240207Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:03:29.4240577Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:03:29.4240976Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:03:29.4241360Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:03:29.4241720Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:03:29.4242135Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:03:29.4242566Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:03:29.4242993Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:03:29.4243409Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:03:29.4243811Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:03:29.4244224Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:03:29.4244637Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:03:29.4245047Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:03:29.4245455Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:03:29.4245839Z parm: NVreg_RmMsg:charp 2025-12-04T09:03:29.4246182Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:03:29.4246577Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:03:29.4247065Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:03:29.4247433Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:03:29.4247915Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:03:29.4248306Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:03:29.4248679Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:03:29.4249038Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:03:29.4249421Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:03:29.4249785Z parm: rm_firmware_active:charp 2025-12-04T09:03:29.4250099Z + set +e 2025-12-04T09:03:29.4250311Z + nvidia-smi 2025-12-04T09:03:31.2294036Z Thu Dec 4 09:03:31 2025 2025-12-04T09:03:31.2294782Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:31.2295427Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:03:31.2296078Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:31.2296695Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:03:31.2297368Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:03:31.2297907Z | | | MIG M. | 2025-12-04T09:03:31.2298311Z |=========================================+========================+======================| 2025-12-04T09:03:31.2674214Z | 0 Tesla T4 Off | 00000000:00:1B.0 Off | 0 | 2025-12-04T09:03:31.2675045Z | N/A 26C P0 25W / 70W | 0MiB / 15360MiB | 4% Default | 2025-12-04T09:03:31.2675795Z | | | N/A | 2025-12-04T09:03:31.2676410Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:31.2676935Z | 1 Tesla T4 Off | 00000000:00:1C.0 Off | 0 | 2025-12-04T09:03:31.2677447Z | N/A 26C P0 24W / 70W | 0MiB / 15360MiB | 1% Default | 2025-12-04T09:03:31.2677902Z | | | N/A | 2025-12-04T09:03:31.2678369Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:31.2679148Z | 2 Tesla T4 Off | 00000000:00:1D.0 Off | 0 | 2025-12-04T09:03:31.2679839Z | N/A 27C P0 24W / 70W | 0MiB / 15360MiB | 4% Default | 2025-12-04T09:03:31.2680324Z | | | N/A | 2025-12-04T09:03:31.2680804Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:31.2681358Z | 3 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:03:31.2681883Z | N/A 26C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:03:31.2682353Z | | | N/A | 2025-12-04T09:03:31.2682827Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:31.2683201Z 2025-12-04T09:03:31.2683412Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:31.2683947Z | Processes: | 2025-12-04T09:03:31.2684502Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:03:31.2685005Z | ID ID Usage | 2025-12-04T09:03:31.2685436Z |=========================================================================================| 2025-12-04T09:03:31.2696783Z | No running processes found | 2025-12-04T09:03:31.2697392Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:32.9090499Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T09:03:34.6972165Z Tesla T4 2025-12-04T09:03:36.0097834Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:03:36.0098205Z + '[' 0 -eq 0 ']' 2025-12-04T09:03:36.0098490Z + echo 'INFO: Ignoring allowed status 0' 2025-12-04T09:03:36.0098853Z + set -e 2025-12-04T09:03:36.0099111Z INFO: Ignoring allowed status 0 2025-12-04T09:03:36.0104535Z == Installing nvidia container toolkit for amzn2023 == 2025-12-04T09:03:36.0108529Z + sudo yum install -y yum-utils 2025-12-04T09:03:36.5161860Z Last metadata expiration check: 0:07:29 ago on Thu Dec 4 08:56:07 2025. 2025-12-04T09:03:36.5465602Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed. 2025-12-04T09:03:36.6075986Z Dependencies resolved. 2025-12-04T09:03:36.6381491Z Nothing to do. 2025-12-04T09:03:36.6381774Z Complete! 2025-12-04T09:03:36.7516748Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]] 2025-12-04T09:03:36.7517772Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:36.7518871Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:37.0916282Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:37.1417992Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8 2025-12-04T09:03:37.7045023Z nvidia-container-toolkit 22 kB/s | 833 B 00:00 2025-12-04T09:03:37.7994560Z Dependencies resolved. 2025-12-04T09:03:37.8297180Z ================================================================================ 2025-12-04T09:03:37.8297733Z Package Arch Version Repository Size 2025-12-04T09:03:37.8298228Z ================================================================================ 2025-12-04T09:03:37.8298607Z Downgrading: 2025-12-04T09:03:37.8299052Z libnvidia-container-tools x86_64 1.17.8-1 nvidia-container-toolkit 40 k 2025-12-04T09:03:37.8299769Z libnvidia-container1 x86_64 1.17.8-1 nvidia-container-toolkit 1.0 M 2025-12-04T09:03:37.8300470Z nvidia-container-toolkit x86_64 1.17.8-1 nvidia-container-toolkit 1.2 M 2025-12-04T09:03:37.8301206Z nvidia-container-toolkit-base x86_64 1.17.8-1 nvidia-container-toolkit 5.8 M 2025-12-04T09:03:37.8301653Z 2025-12-04T09:03:37.8301783Z Transaction Summary 2025-12-04T09:03:37.8302083Z ================================================================================ 2025-12-04T09:03:37.8302479Z Downgrade 4 Packages 2025-12-04T09:03:37.8302658Z 2025-12-04T09:03:37.8302774Z Total download size: 8.0 M 2025-12-04T09:03:37.8303086Z Downloading Packages: 2025-12-04T09:03:37.9096144Z (1/4): libnvidia-container1-1.17.8-1.x86_64.rpm 13 MB/s | 1.0 MB 00:00 2025-12-04T09:03:37.9580125Z (2/4): libnvidia-container-tools-1.17.8-1.x86_6 319 kB/s | 40 kB 00:00 2025-12-04T09:03:38.0478009Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 5.7 MB/s | 1.2 MB 00:00 2025-12-04T09:03:38.2372534Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x 18 MB/s | 5.8 MB 00:00 2025-12-04T09:03:38.2381071Z -------------------------------------------------------------------------------- 2025-12-04T09:03:38.2384830Z Total 20 MB/s | 8.0 MB 00:00 2025-12-04T09:03:38.2387733Z Running transaction check 2025-12-04T09:03:38.2535396Z Transaction check succeeded. 2025-12-04T09:03:38.2535789Z Running transaction test 2025-12-04T09:03:38.3047306Z Transaction test succeeded. 2025-12-04T09:03:38.3049001Z Running transaction 2025-12-04T09:03:39.5151047Z Preparing : 1/1 2025-12-04T09:03:39.7371565Z Downgrading : nvidia-container-toolkit-base-1.17.8-1.x86_64 1/8 2025-12-04T09:03:39.7981294Z Downgrading : libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:03:39.9158204Z Running scriptlet: libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:03:40.0950719Z Downgrading : libnvidia-container-tools-1.17.8-1.x86_64 3/8 2025-12-04T09:03:40.1597169Z Downgrading : nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:03:40.2497165Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:03:40.2542822Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:40.2544052Z Cleanup : nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:40.3284564Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:40.3329481Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:40.3330856Z Cleanup : libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:40.4047440Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:40.4096413Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:40.4097571Z Cleanup : libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:40.4603922Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:40.4651379Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:40.4652118Z Cleanup : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:40.5342273Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:40.5880860Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 8/8 2025-12-04T09:04:50.1961977Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:04:50.1962831Z Verifying : libnvidia-container-tools-1.17.8-1.x86_64 1/8 2025-12-04T09:04:50.1963499Z Verifying : libnvidia-container-tools-1.18.1-1.x86_64 2/8 2025-12-04T09:04:50.1964160Z Verifying : libnvidia-container1-1.17.8-1.x86_64 3/8 2025-12-04T09:04:50.1964807Z Verifying : libnvidia-container1-1.18.1-1.x86_64 4/8 2025-12-04T09:04:50.1965442Z Verifying : nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-12-04T09:04:50.1966112Z Verifying : nvidia-container-toolkit-1.18.1-1.x86_64 6/8 2025-12-04T09:04:50.1966774Z Verifying : nvidia-container-toolkit-base-1.17.8-1.x86_64 7/8 2025-12-04T09:04:50.3557514Z Verifying : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8================================================================================ 2025-12-04T09:04:50.3558233Z WARNING: 2025-12-04T09:04:50.3558538Z A newer release of "Amazon Linux" is available. 2025-12-04T09:04:50.3558836Z 2025-12-04T09:04:50.3558946Z Available Versions: 2025-12-04T09:04:50.3559124Z 2025-12-04T09:04:50.3559247Z Version 2023.9.20250929: 2025-12-04T09:04:50.3559622Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:04:50.3559948Z 2025-12-04T09:04:50.3560091Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:04:50.3560347Z 2025-12-04T09:04:50.3560462Z Release notes: 2025-12-04T09:04:50.3560979Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:04:50.3561456Z 2025-12-04T09:04:50.3561571Z Version 2023.9.20251014: 2025-12-04T09:04:50.3561939Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:04:50.3562245Z 2025-12-04T09:04:50.3562392Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:04:50.3562642Z 2025-12-04T09:04:50.3562739Z Release notes: 2025-12-04T09:04:50.3563219Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:04:50.3563670Z 2025-12-04T09:04:50.3563786Z Version 2023.9.20251020: 2025-12-04T09:04:50.3564155Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:04:50.3564460Z 2025-12-04T09:04:50.3564591Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:04:50.3564852Z 2025-12-04T09:04:50.3564948Z Release notes: 2025-12-04T09:04:50.3565420Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:04:50.3565876Z 2025-12-04T09:04:50.3565978Z Version 2023.9.20251027: 2025-12-04T09:04:50.3566349Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:04:50.3566663Z 2025-12-04T09:04:50.3566794Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:04:50.3567042Z 2025-12-04T09:04:50.3567154Z Release notes: 2025-12-04T09:04:50.3567614Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:04:50.3568074Z 2025-12-04T09:04:50.3568173Z Version 2023.9.20251105: 2025-12-04T09:04:50.3568540Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:04:50.3568842Z 2025-12-04T09:04:50.3568984Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:04:50.3569234Z 2025-12-04T09:04:50.3569437Z Release notes: 2025-12-04T09:04:50.3569988Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:04:50.3570397Z 2025-12-04T09:04:50.3570831Z Version 2023.9.20251110: 2025-12-04T09:04:50.3571169Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:04:50.3571629Z 2025-12-04T09:04:50.3571752Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:04:50.3571993Z 2025-12-04T09:04:50.3572080Z Release notes: 2025-12-04T09:04:50.3572518Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:04:50.3572930Z 2025-12-04T09:04:50.3573131Z Version 2023.9.20251117: 2025-12-04T09:04:50.3573651Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:04:50.3573964Z 2025-12-04T09:04:50.3574112Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:04:50.3574369Z 2025-12-04T09:04:50.3574468Z Release notes: 2025-12-04T09:04:50.3574957Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:04:50.3575435Z 2025-12-04T09:04:50.3575566Z ================================================================================ 2025-12-04T09:04:50.4163256Z 2025-12-04T09:04:50.4163591Z 2025-12-04T09:04:50.4163909Z Downgraded: 2025-12-04T09:04:50.4164425Z libnvidia-container-tools-1.17.8-1.x86_64 2025-12-04T09:04:50.4165116Z libnvidia-container1-1.17.8-1.x86_64 2025-12-04T09:04:50.4165772Z nvidia-container-toolkit-1.17.8-1.x86_64 2025-12-04T09:04:50.4166470Z nvidia-container-toolkit-base-1.17.8-1.x86_64 2025-12-04T09:04:50.4166890Z 2025-12-04T09:04:50.4166998Z Complete! 2025-12-04T09:04:50.4895220Z + sudo systemctl restart docker 2025-12-04T09:05:00.2781688Z Thu Dec 4 09:05:00 2025 2025-12-04T09:05:00.2782183Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:00.2782817Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:05:00.2783464Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:00.2784092Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:05:00.2784766Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:05:00.2785305Z | | | MIG M. | 2025-12-04T09:05:00.2785717Z |=========================================+========================+======================| 2025-12-04T09:05:00.3184906Z | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | 2025-12-04T09:05:00.3185476Z | N/A 26C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:00.3185982Z | | | N/A | 2025-12-04T09:05:00.3186499Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:00.3187034Z | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | 2025-12-04T09:05:00.3187565Z | N/A 26C P0 24W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:00.3188036Z | | | N/A | 2025-12-04T09:05:00.3188528Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:00.3189055Z | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | 2025-12-04T09:05:00.3189571Z | N/A 27C P0 24W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:00.3190038Z | | | N/A | 2025-12-04T09:05:00.3190517Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:00.3191414Z | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:05:00.3192041Z | N/A 26C P0 25W / 70W | 0MiB / 15360MiB | 9% Default | 2025-12-04T09:05:00.3192494Z | | | N/A | 2025-12-04T09:05:00.3192961Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:00.3193322Z 2025-12-04T09:05:00.3193528Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:00.3194048Z | Processes: | 2025-12-04T09:05:00.3194578Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:05:00.3209460Z | ID ID Usage | 2025-12-04T09:05:00.3209904Z |=========================================================================================| 2025-12-04T09:05:00.3210614Z | No running processes found | 2025-12-04T09:05:00.3211224Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:00.7100210Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally 2025-12-04T09:05:00.8771050Z 3.13: Pulling from docker/library/python 2025-12-04T09:05:00.9615092Z 53c88f1dfeb7: Pulling fs layer 2025-12-04T09:05:00.9615480Z eae668646f44: Pulling fs layer 2025-12-04T09:05:00.9615808Z ff2e6e687b6c: Pulling fs layer 2025-12-04T09:05:00.9616144Z 7c40a3faff76: Pulling fs layer 2025-12-04T09:05:00.9616475Z 967a3b1c8fef: Pulling fs layer 2025-12-04T09:05:00.9616789Z a64e1a44f22a: Pulling fs layer 2025-12-04T09:05:00.9617120Z 52655f8a5bcc: Pulling fs layer 2025-12-04T09:05:00.9617495Z 967a3b1c8fef: Waiting 2025-12-04T09:05:00.9617760Z a64e1a44f22a: Waiting 2025-12-04T09:05:00.9618064Z 52655f8a5bcc: Waiting 2025-12-04T09:05:00.9618348Z 7c40a3faff76: Waiting 2025-12-04T09:05:01.0937478Z eae668646f44: Verifying Checksum 2025-12-04T09:05:01.0937850Z eae668646f44: Download complete 2025-12-04T09:05:01.2130655Z 53c88f1dfeb7: Verifying Checksum 2025-12-04T09:05:01.2131086Z 53c88f1dfeb7: Download complete 2025-12-04T09:05:01.2904178Z ff2e6e687b6c: Verifying Checksum 2025-12-04T09:05:01.2904594Z ff2e6e687b6c: Download complete 2025-12-04T09:05:01.3050028Z 967a3b1c8fef: Verifying Checksum 2025-12-04T09:05:01.3050558Z 967a3b1c8fef: Download complete 2025-12-04T09:05:01.3615088Z 52655f8a5bcc: Verifying Checksum 2025-12-04T09:05:01.3615493Z 52655f8a5bcc: Download complete 2025-12-04T09:05:01.4719608Z a64e1a44f22a: Verifying Checksum 2025-12-04T09:05:01.4720052Z a64e1a44f22a: Download complete 2025-12-04T09:05:02.1162304Z 7c40a3faff76: Verifying Checksum 2025-12-04T09:05:02.1163141Z 7c40a3faff76: Download complete 2025-12-04T09:05:02.3903866Z 53c88f1dfeb7: Pull complete 2025-12-04T09:05:02.9040826Z eae668646f44: Pull complete 2025-12-04T09:05:04.6312532Z ff2e6e687b6c: Pull complete 2025-12-04T09:05:09.5335136Z 7c40a3faff76: Pull complete 2025-12-04T09:05:09.7216439Z 967a3b1c8fef: Pull complete 2025-12-04T09:05:10.2896017Z a64e1a44f22a: Pull complete 2025-12-04T09:05:10.3034119Z 52655f8a5bcc: Pull complete 2025-12-04T09:05:10.3117157Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T09:05:10.3138146Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13 2025-12-04T09:05:19.1254836Z Thu Dec 4 09:05:19 2025 2025-12-04T09:05:19.1255350Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:19.1255984Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:05:19.1256589Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:19.1259189Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:05:19.1260012Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:05:19.1260710Z | | | MIG M. | 2025-12-04T09:05:19.1261117Z |=========================================+========================+======================| 2025-12-04T09:05:19.1853676Z | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | 2025-12-04T09:05:19.1854329Z | N/A 25C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:19.1854823Z | | | N/A | 2025-12-04T09:05:19.1855311Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:19.1855860Z | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | 2025-12-04T09:05:19.1856403Z | N/A 25C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:19.1856908Z | | | N/A | 2025-12-04T09:05:19.1857443Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:19.1857991Z | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | 2025-12-04T09:05:19.1858512Z | N/A 27C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:19.1858966Z | | | N/A | 2025-12-04T09:05:19.1859453Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:19.1859990Z | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:05:19.1860517Z | N/A 25C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:19.1860969Z | | | N/A | 2025-12-04T09:05:19.1861462Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:19.1861839Z 2025-12-04T09:05:19.1862054Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:19.1862588Z | Processes: | 2025-12-04T09:05:19.1863126Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:05:19.1863642Z | ID ID Usage | 2025-12-04T09:05:19.1864067Z |=========================================================================================| 2025-12-04T09:05:19.1881906Z | No running processes found | 2025-12-04T09:05:19.1882536Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:21.3180502Z Command completed after 1 attempt(s). 2025-12-04T09:05:21.3272301Z Prepare all required actions 2025-12-04T09:05:21.3306480Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T09:05:21.3306831Z with: 2025-12-04T09:05:21.3307447Z github-token: *** 2025-12-04T09:05:21.3307689Z env: 2025-12-04T09:05:21.3307897Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:21.3308176Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:21.3308506Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:21.3308882Z ##[endgroup] 2025-12-04T09:05:21.3323996Z ##[group]Run set -eux 2025-12-04T09:05:21.3324275Z set -eux 2025-12-04T09:05:21.3324765Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T09:05:21.3335287Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:21.3335714Z env: 2025-12-04T09:05:21.3335967Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:21.3336452Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:21.3336855Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:21.3337464Z GITHUB_TOKEN: *** 2025-12-04T09:05:21.3337721Z ##[endgroup] 2025-12-04T09:05:21.3369487Z + python3 .github/scripts/get_workflow_job_id.py 19922768520 i-02e8ffc45eb447a37 2025-12-04T09:05:22.7738703Z Setting output job-id=57116084892 2025-12-04T09:05:22.7739560Z Setting output job-name=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:22.7865328Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:05:22.7866107Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:05:22.7867120Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-12-04T09:05:22.7868005Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:22.7874797Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:22.7875198Z env: 2025-12-04T09:05:22.7875411Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:22.7875690Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:22.7876017Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:22.7876380Z JOB_ID: 57116084892 2025-12-04T09:05:22.7877017Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:22.7877652Z WORKFLOW_NAME: trunk 2025-12-04T09:05:22.7877919Z WORKFLOW_RUN_ID: 19922768520 2025-12-04T09:05:22.7878207Z MONITOR_LOG_INTERVAL: 5 2025-12-04T09:05:22.7878494Z MONITOR_DATA_COLLECT_INTERVAL: 1 2025-12-04T09:05:22.7879153Z ##[endgroup] 2025-12-04T09:05:23.1044971Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:05:23.4839324Z Collecting psutil==5.9.8 2025-12-04T09:05:23.5018951Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-12-04T09:05:23.5786325Z Collecting dataclasses_json==0.6.7 2025-12-04T09:05:23.5823310Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-12-04T09:05:23.6098813Z Collecting nvidia-ml-py==11.525.84 2025-12-04T09:05:23.6139091Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-12-04T09:05:23.6468750Z Collecting typing-inspect<1,>=0.4.0 2025-12-04T09:05:23.6507599Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-12-04T09:05:23.7656606Z Collecting marshmallow<4.0.0,>=3.18.0 2025-12-04T09:05:23.7688028Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-12-04T09:05:23.8272095Z Collecting packaging>=17.0 2025-12-04T09:05:23.8310773Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-12-04T09:05:23.8559711Z Collecting mypy-extensions>=0.3.0 2025-12-04T09:05:23.8593161Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-12-04T09:05:23.9095693Z Collecting typing-extensions>=3.7.4 2025-12-04T09:05:23.9128064Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-12-04T09:05:24.0193071Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json 2025-12-04T09:05:24.3144718Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-12-04T09:05:24.5017460Z Prepare all required actions 2025-12-04T09:05:24.5017927Z Getting action download info 2025-12-04T09:05:24.6926345Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:05:25.0423564Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T09:05:25.3806966Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T09:05:25.3807534Z with: 2025-12-04T09:05:25.3807817Z name: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:25.3808192Z s3-bucket: gha-artifacts 2025-12-04T09:05:25.3808467Z env: 2025-12-04T09:05:25.3808704Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:25.3808998Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:25.3809341Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:25.3809741Z ##[endgroup] 2025-12-04T09:05:25.3841468Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:05:25.3841852Z with: 2025-12-04T09:05:25.3842166Z name: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:25.3842538Z s3-bucket: gha-artifacts 2025-12-04T09:05:25.3842838Z region: us-east-1 2025-12-04T09:05:25.3843082Z env: 2025-12-04T09:05:25.3843322Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:25.3843620Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:25.3843979Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:25.3844375Z ##[endgroup] 2025-12-04T09:05:25.8887527Z (node:62953) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:05:25.8888118Z 2025-12-04T09:05:25.8888355Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:05:25.8888983Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:05:25.8889651Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:05:26.1206712Z Found 1 objects with prefix pytorch/pytorch/19922768520/linux-jammy-cuda12.8-py3.10-gcc11/ 2025-12-04T09:05:26.1207582Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:05:34.4019566Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:05:34.4025730Z Artifact download has finished successfully 2025-12-04T09:05:34.4277399Z ##[group]Run unzip -o artifacts.zip 2025-12-04T09:05:34.4277761Z unzip -o artifacts.zip 2025-12-04T09:05:34.4284765Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:34.4285200Z env: 2025-12-04T09:05:34.4285445Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:34.4285735Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:34.4286088Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:34.4286501Z ##[endgroup] 2025-12-04T09:05:34.4366403Z Archive: artifacts.zip 2025-12-04T09:05:34.4366818Z creating: dist/ 2025-12-04T09:05:34.4503685Z inflating: dist/.ninja_log 2025-12-04T09:05:36.9681948Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:05:36.9682538Z creating: build/ 2025-12-04T09:05:36.9682858Z creating: build/custom_test_artifacts/ 2025-12-04T09:05:36.9683338Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T09:05:36.9683905Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T09:05:36.9684606Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:05:36.9690836Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:05:36.9691759Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T09:05:36.9692829Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:05:36.9693965Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:05:36.9695091Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:05:36.9696057Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:05:36.9697176Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:05:36.9698081Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:05:36.9699086Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:05:36.9699935Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:05:36.9701015Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:05:36.9702303Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:05:36.9703296Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:05:36.9704850Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:05:36.9706823Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:05:36.9707762Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:05:36.9708601Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:05:36.9764382Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:05:36.9820546Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:05:36.9821841Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:05:36.9880544Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:05:36.9881815Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:05:36.9883077Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:05:36.9884354Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:05:36.9885601Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:05:36.9886827Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:05:36.9888061Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:05:36.9889263Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:05:36.9890443Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:05:36.9891567Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:05:36.9892663Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:05:36.9893822Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:05:36.9894901Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:05:36.9896173Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:05:36.9897236Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:05:36.9967970Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:05:36.9968914Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:05:37.0044274Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:05:37.0045368Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:05:37.0046070Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:05:37.0046806Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T09:05:37.0047588Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T09:05:37.0048439Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T09:05:37.0049403Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T09:05:37.0050343Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T09:05:37.0051208Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T09:05:37.0052111Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T09:05:37.0053121Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T09:05:37.0054210Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T09:05:37.0055146Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T09:05:37.0056061Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T09:05:37.0071537Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T09:05:37.0260907Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T09:05:37.0261784Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T09:05:37.0262735Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T09:05:37.0263799Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T09:05:37.0264819Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T09:05:37.0265869Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T09:05:37.0266808Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T09:05:37.0267764Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T09:05:37.0268719Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T09:05:37.0269675Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T09:05:37.0270602Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T09:05:37.0287184Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T09:05:37.0367128Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T09:05:37.0368579Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:05:37.0369488Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:05:37.0370312Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T09:05:37.0371064Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T09:05:37.0371898Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T09:05:37.0372630Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-12-04T09:05:37.0373621Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T09:05:37.0374275Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T09:05:37.0374950Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T09:05:37.0535104Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T09:05:37.0588371Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T09:05:37.0589017Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T09:05:37.0589578Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T09:05:37.0590245Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:05:37.0596287Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:05:37.0597048Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T09:05:37.0597797Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:05:37.0598596Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:05:37.0599375Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:05:37.0600288Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:05:37.0601211Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:05:37.0602055Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:05:37.0602882Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:05:37.0603690Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:05:37.0604661Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:05:37.0606045Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:05:37.0606940Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:05:37.0608415Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:05:37.0610578Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:05:37.0611489Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:05:37.0612303Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:05:37.0668537Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:05:37.0728012Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:05:37.0729260Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:05:37.0789534Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:05:37.0790904Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:05:37.0792114Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:05:37.0793363Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:05:37.0794645Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:05:37.0795816Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:05:37.0796989Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:05:37.0798171Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:05:37.0799326Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:05:37.0800395Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:05:37.0801444Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:05:37.0802466Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:05:37.0803496Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:05:37.0804485Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:05:37.0805507Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:05:37.0878203Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:05:37.0879539Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:05:37.0957070Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:05:37.0958014Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:05:37.0958703Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:05:37.0959431Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T09:05:37.0960204Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T09:05:37.0961095Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T09:05:37.0962088Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T09:05:37.0963055Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T09:05:37.0963944Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T09:05:37.0964873Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T09:05:37.0965795Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T09:05:37.0966726Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T09:05:37.0967649Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T09:05:37.0968744Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T09:05:37.0983119Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T09:05:37.1043732Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T09:05:37.1044911Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:05:37.1046003Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:05:37.1046800Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T09:05:37.1047545Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T09:05:37.1048273Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T09:05:37.1049014Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-12-04T09:05:37.1049688Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T09:05:37.1050323Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T09:05:37.1050964Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T09:05:37.1088986Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T09:05:37.1089647Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T09:05:37.1090282Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T09:05:37.1091134Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:05:37.1098285Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:05:37.1099127Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T09:05:37.1099973Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:05:37.1100870Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:05:37.1101753Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:05:37.1102748Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:05:37.1103776Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:05:37.1104839Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:05:37.1105743Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:05:37.1106603Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:05:37.1107625Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:05:37.1108664Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:05:37.1109621Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:05:37.1111132Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:05:37.1112777Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:05:37.1113772Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:05:37.1114654Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:05:37.1170794Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:05:37.1229276Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:05:37.1230775Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:05:37.1287917Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:05:37.1289479Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:05:37.1290824Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:05:37.1292262Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:05:37.1293806Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:05:37.1295093Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:05:37.1296387Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:05:37.1297674Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:05:37.1298930Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:05:37.1300122Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:05:37.1301276Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:05:37.1302393Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:05:37.1303529Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:05:37.1304630Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:05:37.1305864Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:05:37.1376049Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:05:37.1377083Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:05:37.1452126Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:05:37.1453579Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:05:37.1454403Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:05:37.1455215Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T09:05:37.1456083Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T09:05:37.1457078Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T09:05:37.1458181Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T09:05:37.1459256Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T09:05:37.1460455Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T09:05:37.1461502Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T09:05:37.1462536Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T09:05:37.1463577Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T09:05:37.1464711Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T09:05:37.1465838Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T09:05:37.1466901Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T09:05:37.1577058Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T09:05:37.1578133Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T09:05:37.1579392Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T09:05:37.1580570Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T09:05:37.1581710Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T09:05:37.1582771Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T09:05:37.1583867Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T09:05:37.1584969Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T09:05:37.1586061Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T09:05:37.1587159Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T09:05:37.1588242Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T09:05:37.1604725Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T09:05:37.1657913Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T09:05:37.1659098Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:05:37.1660104Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:05:37.1661029Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T09:05:37.1661849Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T09:05:37.1662665Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T09:05:37.1663485Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-12-04T09:05:37.1664528Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T09:05:37.1665500Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T09:05:37.1666209Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T09:05:37.1766249Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T09:05:37.1804043Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T09:05:37.1804766Z creating: build/lib/ 2025-12-04T09:05:37.1882688Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T09:05:37.2308481Z inflating: build/lib/libprotobuf.a 2025-12-04T09:05:37.2785411Z inflating: build/lib/libprotoc.a 2025-12-04T09:05:37.2795092Z inflating: build/lib/libpthreadpool.a 2025-12-04T09:05:37.2803224Z inflating: build/lib/libcpuinfo.a 2025-12-04T09:05:37.2810924Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T09:05:37.2811734Z inflating: build/lib/libclog.a 2025-12-04T09:05:37.2831709Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T09:05:37.2833615Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T09:05:37.2852557Z inflating: build/lib/libnnpack.a 2025-12-04T09:05:37.3031381Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T09:05:37.3849138Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T09:05:37.3917521Z inflating: build/lib/libgtest.a 2025-12-04T09:05:37.3934133Z inflating: build/lib/libgmock.a 2025-12-04T09:05:37.3934657Z inflating: build/lib/libgtest_main.a 2025-12-04T09:05:37.3935069Z inflating: build/lib/libgmock_main.a 2025-12-04T09:05:37.4020531Z inflating: build/lib/libXNNPACK.a 2025-12-04T09:05:37.4094539Z inflating: build/lib/libbenchmark.a 2025-12-04T09:05:37.4095001Z inflating: build/lib/libbenchmark_main.a 2025-12-04T09:05:37.4095448Z inflating: build/lib/libjitprofiling.a 2025-12-04T09:05:37.4103370Z inflating: build/lib/libittnotify.a 2025-12-04T09:05:37.4168191Z inflating: build/lib/libasmjit.a 2025-12-04T09:05:37.5252854Z inflating: build/lib/libfbgemm.a 2025-12-04T09:05:37.5281986Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T09:05:37.5798836Z inflating: build/lib/libtensorpipe.a 2025-12-04T09:05:37.6030941Z inflating: build/lib/libtensorpipe_cuda.a 2025-12-04T09:05:37.6160230Z inflating: build/lib/libgloo.a 2025-12-04T09:05:37.6205905Z inflating: build/lib/libonnx_proto.a 2025-12-04T09:05:37.6612442Z inflating: build/lib/libgloo_cuda.a 2025-12-04T09:05:37.7298075Z inflating: build/lib/libonnx.a 2025-12-04T09:05:37.7318322Z inflating: build/lib/libfmt.a 2025-12-04T09:05:38.6966470Z inflating: build/lib/libdnnl.a 2025-12-04T09:05:38.7418496Z inflating: build/lib/libkineto.a 2025-12-04T09:05:38.7530663Z inflating: build/lib/libc10.so 2025-12-04T09:05:38.7577021Z inflating: build/lib/libc10_cuda.so 2025-12-04T09:05:38.7578926Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T09:05:38.7580379Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T09:05:41.7179755Z inflating: build/lib/libtorch_cpu.so 2025-12-04T09:05:41.7948442Z inflating: build/lib/libtorch_nvshmem.so 2025-12-04T09:05:44.7022499Z inflating: build/lib/libtorch_cuda.so 2025-12-04T09:05:44.7023597Z inflating: build/lib/libtorch.so 2025-12-04T09:05:44.7074825Z inflating: build/lib/libtorch_cuda_linalg.so 2025-12-04T09:05:44.7142606Z inflating: build/lib/libtorchbind_test.so 2025-12-04T09:05:44.7163168Z inflating: build/lib/libjitbackend_test.so 2025-12-04T09:05:44.7186103Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T09:05:44.7212723Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T09:05:44.7213986Z inflating: build/lib/libc10d_cuda_test.so 2025-12-04T09:05:44.7218589Z inflating: build/lib/libshm.so 2025-12-04T09:05:44.9486011Z inflating: build/lib/libtorch_python.so 2025-12-04T09:05:44.9522109Z inflating: build/lib/libnnapi_backend.so 2025-12-04T09:05:44.9523031Z creating: build/bin/ 2025-12-04T09:05:44.9961848Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T09:05:45.0400035Z inflating: build/bin/protoc 2025-12-04T09:05:45.0455485Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T09:05:45.0511962Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T09:05:45.0567727Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T09:05:45.0622835Z inflating: build/bin/c10_Device_test 2025-12-04T09:05:45.0686086Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T09:05:45.0746777Z inflating: build/bin/c10_Scalar_test 2025-12-04T09:05:45.0801551Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T09:05:45.0860570Z inflating: build/bin/c10_SymInt_test 2025-12-04T09:05:45.0923047Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T09:05:45.0980531Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T09:05:45.1034820Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T09:05:45.1094623Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T09:05:45.1167310Z inflating: build/bin/c10_cow_test 2025-12-04T09:05:45.1223109Z inflating: build/bin/c10_Bitset_test 2025-12-04T09:05:45.1276798Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T09:05:45.1330480Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T09:05:45.1385805Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T09:05:45.1446576Z inflating: build/bin/c10_LeftRight_test 2025-12-04T09:05:45.1502796Z inflating: build/bin/c10_Half_test 2025-12-04T09:05:45.1557951Z inflating: build/bin/c10_Semaphore_test 2025-12-04T09:05:45.1618626Z inflating: build/bin/c10_Enumerate_test 2025-12-04T09:05:45.1676396Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T09:05:45.1732257Z inflating: build/bin/c10_Synchronized_test 2025-12-04T09:05:45.1791952Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T09:05:45.1846388Z inflating: build/bin/c10_accumulate_test 2025-12-04T09:05:45.1902621Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T09:05:45.1957821Z inflating: build/bin/c10_bit_cast_test 2025-12-04T09:05:45.2016233Z inflating: build/bin/c10_bfloat16_test 2025-12-04T09:05:45.2077155Z inflating: build/bin/c10_complex_math_test 2025-12-04T09:05:45.2133657Z inflating: build/bin/c10_exception_test 2025-12-04T09:05:45.2185965Z inflating: build/bin/c10_error_test 2025-12-04T09:05:45.2245271Z inflating: build/bin/c10_complex_test 2025-12-04T09:05:45.2299545Z inflating: build/bin/c10_flags_test 2025-12-04T09:05:45.2354545Z inflating: build/bin/c10_generic_math_test 2025-12-04T09:05:45.2513173Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T09:05:45.2567451Z inflating: build/bin/c10_irange_test 2025-12-04T09:05:45.2623796Z inflating: build/bin/c10_lazy_test 2025-12-04T09:05:45.2678123Z inflating: build/bin/c10_nofatal_test 2025-12-04T09:05:45.2740576Z inflating: build/bin/c10_logging_test 2025-12-04T09:05:45.2818269Z inflating: build/bin/c10_optional_test 2025-12-04T09:05:45.2883893Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T09:05:45.3039794Z inflating: build/bin/c10_small_vector_test 2025-12-04T09:05:45.3097267Z inflating: build/bin/c10_registry_test 2025-12-04T09:05:45.3158009Z inflating: build/bin/c10_string_util_test 2025-12-04T09:05:45.3213424Z inflating: build/bin/c10_ssize_test 2025-12-04T09:05:45.3265190Z inflating: build/bin/c10_string_view_test 2025-12-04T09:05:45.3315018Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T09:05:45.3367856Z inflating: build/bin/c10_tempfile_test 2025-12-04T09:05:45.3428339Z inflating: build/bin/c10_typeid_test 2025-12-04T09:05:45.3483407Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-12-04T09:05:45.3540931Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-12-04T09:05:45.3598787Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T09:05:45.3651375Z inflating: build/bin/c10_cuda_CUDATest 2025-12-04T09:05:45.3710934Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-12-04T09:05:45.3769046Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-12-04T09:05:45.3825361Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T09:05:45.3881908Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-12-04T09:05:45.4455064Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T09:05:45.5040263Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T09:05:45.5637447Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T09:05:45.5690391Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T09:05:45.5793839Z inflating: build/bin/test_aoti_abi_check 2025-12-04T09:05:45.5847463Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T09:05:45.5901536Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T09:05:45.5978098Z inflating: build/bin/Dict_test 2025-12-04T09:05:45.6035602Z inflating: build/bin/Dimname_test 2025-12-04T09:05:45.6106102Z inflating: build/bin/MaybeOwned_test 2025-12-04T09:05:45.6166225Z inflating: build/bin/NamedTensor_test 2025-12-04T09:05:45.6229762Z inflating: build/bin/apply_utils_test 2025-12-04T09:05:45.6293769Z inflating: build/bin/atest 2025-12-04T09:05:45.6360086Z inflating: build/bin/basic 2025-12-04T09:05:45.6418672Z inflating: build/bin/broadcast_test 2025-12-04T09:05:45.6474329Z inflating: build/bin/cpu_allocator_test 2025-12-04T09:05:45.6535982Z inflating: build/bin/cpu_generator_test 2025-12-04T09:05:45.6594623Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T09:05:45.6689942Z inflating: build/bin/cpu_rng_test 2025-12-04T09:05:45.6748593Z inflating: build/bin/dlconvertor_test 2025-12-04T09:05:45.6809758Z inflating: build/bin/extension_backend_test 2025-12-04T09:05:45.6868846Z inflating: build/bin/half_test 2025-12-04T09:05:45.6970243Z inflating: build/bin/ivalue_test 2025-12-04T09:05:45.7022891Z inflating: build/bin/lazy_tensor_test 2025-12-04T09:05:45.7080692Z inflating: build/bin/math_kernel_test 2025-12-04T09:05:45.7135584Z inflating: build/bin/memory_format_test 2025-12-04T09:05:45.7194281Z inflating: build/bin/memory_overlapping_test 2025-12-04T09:05:45.7250505Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T09:05:45.7312322Z inflating: build/bin/native_test 2025-12-04T09:05:45.7366653Z inflating: build/bin/operator_name_test 2025-12-04T09:05:45.7420391Z inflating: build/bin/operators_test 2025-12-04T09:05:45.7477674Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T09:05:45.7549879Z inflating: build/bin/pow_test 2025-12-04T09:05:45.7609868Z inflating: build/bin/quantized_test 2025-12-04T09:05:45.7662400Z inflating: build/bin/reduce_ops_test 2025-12-04T09:05:45.7720439Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T09:05:45.7778931Z inflating: build/bin/scalar_tensor_test 2025-12-04T09:05:45.7840630Z inflating: build/bin/scalar_test 2025-12-04T09:05:45.7896898Z inflating: build/bin/StorageUtils_test 2025-12-04T09:05:45.7953478Z inflating: build/bin/stride_properties_test 2025-12-04T09:05:45.8037576Z inflating: build/bin/tensor_iterator_test 2025-12-04T09:05:45.8095629Z inflating: build/bin/test_parallel 2025-12-04T09:05:45.8151056Z inflating: build/bin/thread_init_test 2025-12-04T09:05:45.8210684Z inflating: build/bin/type_ptr_test 2025-12-04T09:05:45.8273406Z inflating: build/bin/type_test 2025-12-04T09:05:45.8329988Z inflating: build/bin/undefined_tensor_test 2025-12-04T09:05:45.8381864Z inflating: build/bin/verify_api_visibility 2025-12-04T09:05:45.8454961Z inflating: build/bin/legacy_vmap_test 2025-12-04T09:05:45.8513694Z inflating: build/bin/weakref_test 2025-12-04T09:05:45.8568439Z inflating: build/bin/wrapdim_test 2025-12-04T09:05:45.8622925Z inflating: build/bin/xla_tensor_test 2025-12-04T09:05:45.8685660Z inflating: build/bin/IListRef_test 2025-12-04T09:05:45.8796194Z inflating: build/bin/List_test 2025-12-04T09:05:45.8864876Z inflating: build/bin/KernelFunction_test 2025-12-04T09:05:45.8989816Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T09:05:45.9087837Z inflating: build/bin/kernel_function_test 2025-12-04T09:05:45.9215061Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T09:05:45.9321220Z inflating: build/bin/kernel_lambda_test 2025-12-04T09:05:45.9383972Z inflating: build/bin/kernel_stackbased_test 2025-12-04T09:05:45.9481837Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T09:05:45.9537927Z inflating: build/bin/CppSignature_test 2025-12-04T09:05:45.9597952Z inflating: build/bin/backend_fallback_test 2025-12-04T09:05:45.9651084Z inflating: build/bin/op_allowlist_test 2025-12-04T09:05:45.9959781Z inflating: build/bin/op_registration_test 2025-12-04T09:05:46.0031850Z inflating: build/bin/inline_container_test 2025-12-04T09:05:46.0088557Z inflating: build/bin/cuda_allocator_test 2025-12-04T09:05:46.0147800Z inflating: build/bin/cuda_apply_test 2025-12-04T09:05:46.0211525Z inflating: build/bin/cuda_atomic_ops_test 2025-12-04T09:05:46.0271453Z inflating: build/bin/cuda_caching_host_allocator_test 2025-12-04T09:05:46.0345418Z inflating: build/bin/cuda_complex_math_test 2025-12-04T09:05:46.0409380Z inflating: build/bin/cuda_complex_test 2025-12-04T09:05:46.0477552Z inflating: build/bin/cuda_cub_test 2025-12-04T09:05:46.0534148Z inflating: build/bin/cuda_cublas_handle_pool_test 2025-12-04T09:05:46.0588655Z inflating: build/bin/cuda_device_test 2025-12-04T09:05:46.0669990Z inflating: build/bin/cuda_distributions_test 2025-12-04T09:05:46.0727062Z inflating: build/bin/cuda_dlconvertor_test 2025-12-04T09:05:46.0785548Z inflating: build/bin/cuda_event_test 2025-12-04T09:05:46.0839870Z inflating: build/bin/cuda_exchange_device_test 2025-12-04T09:05:46.0901351Z inflating: build/bin/cuda_generator_test 2025-12-04T09:05:46.0957626Z inflating: build/bin/cuda_half_test 2025-12-04T09:05:46.1012530Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-12-04T09:05:46.1077714Z inflating: build/bin/cuda_stream_test 2025-12-04T09:05:46.1134547Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-12-04T09:05:46.1186484Z inflating: build/bin/cuda_cudnn_test 2025-12-04T09:05:46.1242979Z inflating: build/bin/cuda_integer_divider_test 2025-12-04T09:05:46.1296991Z inflating: build/bin/cuda_optional_test 2025-12-04T09:05:46.1353809Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-12-04T09:05:46.1411286Z inflating: build/bin/cuda_vectorized_test 2025-12-04T09:05:46.2487752Z inflating: build/bin/test_jit 2025-12-04T09:05:46.2837462Z inflating: build/bin/test_lazy 2025-12-04T09:05:46.2894529Z inflating: build/bin/BackoffTest 2025-12-04T09:05:46.2951876Z inflating: build/bin/FileStoreTest 2025-12-04T09:05:46.3012888Z inflating: build/bin/TCPStoreTest 2025-12-04T09:05:46.3071328Z inflating: build/bin/HashStoreTest 2025-12-04T09:05:46.3084962Z inflating: build/bin/ProcessGroupMPITest 2025-12-04T09:05:46.3087943Z inflating: build/bin/example_allreduce 2025-12-04T09:05:46.3148879Z inflating: build/bin/test_dist_autograd 2025-12-04T09:05:46.3220459Z inflating: build/bin/test_cpp_rpc 2025-12-04T09:05:46.3291300Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T09:05:46.3352910Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-12-04T09:05:46.3419911Z inflating: build/bin/ProcessGroupNCCLTest 2025-12-04T09:05:46.3484938Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-12-04T09:05:46.4635063Z inflating: build/bin/test_api 2025-12-04T09:05:46.4636376Z inflating: build/bin/parallel_benchmark 2025-12-04T09:05:46.4640162Z inflating: build/bin/torch_shm_manager 2025-12-04T09:05:46.4640537Z creating: .additional_ci_files/ 2025-12-04T09:05:46.4708024Z inflating: .additional_ci_files/test-times.json 2025-12-04T09:05:46.4934608Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T09:05:46.4961634Z ##[group]Run rm artifacts.zip 2025-12-04T09:05:46.4961950Z rm artifacts.zip 2025-12-04T09:05:46.4967930Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:46.4968313Z env: 2025-12-04T09:05:46.4968683Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.4968966Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.4969284Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.4969660Z ##[endgroup] 2025-12-04T09:05:46.5675497Z ##[group]Run df -H 2025-12-04T09:05:46.5675766Z df -H 2025-12-04T09:05:46.5682023Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:46.5682463Z env: 2025-12-04T09:05:46.5682840Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.5683137Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.5683504Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.5683920Z ##[endgroup] 2025-12-04T09:05:46.5728775Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T09:05:46.5729356Z devtmpfs 4.2M 0 4.2M 0% /dev 2025-12-04T09:05:46.5729741Z tmpfs 101G 0 101G 0% /dev/shm 2025-12-04T09:05:46.5730129Z tmpfs 41G 693k 41G 1% /run 2025-12-04T09:05:46.5730640Z /dev/nvme0n1p1 161G 54G 108G 34% / 2025-12-04T09:05:46.5731153Z tmpfs 101G 17k 101G 1% /tmp 2025-12-04T09:05:46.5731538Z /dev/nvme0n1p128 11M 1.4M 9.2M 13% /boot/efi 2025-12-04T09:05:46.5731942Z tmpfs 21G 0 21G 0% /run/user/0 2025-12-04T09:05:46.5768432Z Prepare all required actions 2025-12-04T09:05:46.5769288Z Getting action download info 2025-12-04T09:05:46.7491598Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T09:05:46.7491980Z with: 2025-12-04T09:05:46.7492191Z env: 2025-12-04T09:05:46.7492399Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.7492676Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.7493091Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.7493639Z ##[endgroup] 2025-12-04T09:05:46.7539093Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:05:46.7539493Z with: 2025-12-04T09:05:46.7539724Z name: td_results 2025-12-04T09:05:46.7540002Z s3-bucket: gha-artifacts 2025-12-04T09:05:46.7540310Z region: us-east-1 2025-12-04T09:05:46.7540557Z env: 2025-12-04T09:05:46.7540802Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.7541111Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.7541466Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.7541949Z ##[endgroup] 2025-12-04T09:05:47.2344599Z (node:62976) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:05:47.2345334Z 2025-12-04T09:05:47.2345590Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:05:47.2346205Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:05:47.3415139Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:05:47.3415800Z Found 1 objects with prefix pytorch/pytorch/19922768520/td_results/ 2025-12-04T09:05:47.3416536Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:05:47.4119923Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:05:47.4124127Z Artifact download has finished successfully 2025-12-04T09:05:47.4293587Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T09:05:47.4294106Z mkdir -p .additional_ci_files 2025-12-04T09:05:47.4294614Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T09:05:47.4302069Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:47.4302524Z env: 2025-12-04T09:05:47.4302758Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:47.4303073Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:47.4303439Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:47.4303846Z ##[endgroup] 2025-12-04T09:05:47.4402935Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T09:05:47.4403328Z .github/scripts/parse_ref.py 2025-12-04T09:05:47.4408692Z shell: /usr/bin/bash -e {0} 2025-12-04T09:05:47.4408961Z env: 2025-12-04T09:05:47.4409182Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:47.4409458Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:47.4409771Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:47.4410146Z ##[endgroup] 2025-12-04T09:05:47.4628476Z Setting output branch=main 2025-12-04T09:05:47.4765486Z Prepare all required actions 2025-12-04T09:05:47.4765882Z Getting action download info 2025-12-04T09:05:47.6550993Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T09:05:47.6551525Z with: 2025-12-04T09:05:47.6551955Z github-token: *** 2025-12-04T09:05:47.6563028Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:05:47.6574390Z job-name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:47.6575099Z env: 2025-12-04T09:05:47.6575348Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:47.6575661Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:47.6576014Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:47.6576427Z ##[endgroup] 2025-12-04T09:05:47.6613628Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:05:47.6613982Z with: 2025-12-04T09:05:47.6614283Z shell: bash 2025-12-04T09:05:47.6614526Z timeout_minutes: 10 2025-12-04T09:05:47.6614809Z max_attempts: 5 2025-12-04T09:05:47.6615079Z retry_wait_seconds: 30 2025-12-04T09:05:47.6616169Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:05:47.6617147Z polling_interval_seconds: 1 2025-12-04T09:05:47.6617587Z warning_on_retry: true 2025-12-04T09:05:47.6617896Z continue_on_error: false 2025-12-04T09:05:47.6618179Z env: 2025-12-04T09:05:47.6618448Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:47.6618753Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:47.6619098Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:47.6619696Z GITHUB_TOKEN: *** 2025-12-04T09:05:47.6619968Z ##[endgroup] 2025-12-04T09:05:47.7618280Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:05:48.0121008Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:05:48.1317464Z Collecting requests==2.27.1 2025-12-04T09:05:48.1523382Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T09:05:48.3395752Z Collecting pyyaml==6.0.2 2025-12-04T09:05:48.3433722Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-12-04T09:05:48.3681019Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10) 2025-12-04T09:05:48.3689776Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10) 2025-12-04T09:05:48.4187679Z Collecting certifi>=2017.4.17 2025-12-04T09:05:48.4250683Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T09:05:48.8431074Z Collecting charset-normalizer~=2.0.0 2025-12-04T09:05:48.8470464Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T09:05:48.9413089Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml 2025-12-04T09:05:49.0702752Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-12-04T09:05:49.7435789Z Command completed after 1 attempt(s). 2025-12-04T09:05:49.7490048Z ##[group]Run set -x 2025-12-04T09:05:49.7490310Z set -x 2025-12-04T09:05:49.7490548Z  2025-12-04T09:05:49.7490949Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:05:49.7491456Z # in runner workspace 2025-12-04T09:05:49.7491861Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T09:05:49.7499451Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:49.7499970Z env: 2025-12-04T09:05:49.7500183Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:49.7500464Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:49.7500791Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:49.7501147Z ##[endgroup] 2025-12-04T09:05:49.7526700Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T09:05:49.7718540Z Setting output branch=main 2025-12-04T09:05:49.7772503Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:05:49.7773065Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:05:49.7773605Z echo "Job name: ${JOB_NAME}" 2025-12-04T09:05:49.7773951Z  2025-12-04T09:05:49.7774455Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:05:49.7775042Z # in runner workspace 2025-12-04T09:05:49.7775541Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T09:05:49.7776091Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T09:05:49.7776484Z  --job-name "${JOB_NAME}" \ 2025-12-04T09:05:49.7788699Z  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" \ 2025-12-04T09:05:49.7800160Z  --selected-test-configs "" \ 2025-12-04T09:05:49.7800517Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T09:05:49.7800831Z  --tag "${TAG}" \ 2025-12-04T09:05:49.7801135Z  --event-name "${EVENT_NAME}" \ 2025-12-04T09:05:49.7801469Z  --schedule "${SCHEDULE}" \ 2025-12-04T09:05:49.7801796Z  --branch "${HEAD_BRANCH}" 2025-12-04T09:05:49.7807319Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:49.7807714Z env: 2025-12-04T09:05:49.7807928Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:49.7808219Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:49.7808547Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:49.7809140Z GITHUB_TOKEN: *** 2025-12-04T09:05:49.7809711Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:49.7810356Z PR_NUMBER: 2025-12-04T09:05:49.7810585Z TAG: 2025-12-04T09:05:49.7810793Z EVENT_NAME: schedule 2025-12-04T09:05:49.7811051Z SCHEDULE: 29 8 * * * 2025-12-04T09:05:49.7811308Z HEAD_BRANCH: main 2025-12-04T09:05:49.7811550Z ##[endgroup] 2025-12-04T09:05:49.7835398Z Workflow: trunk 2025-12-04T09:05:49.7836273Z Job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:49.9792398Z Setting output keep-going=True 2025-12-04T09:05:49.9792928Z Setting output ci-verbose-test-logs=False 2025-12-04T09:05:49.9793342Z Setting output ci-test-showlocals=False 2025-12-04T09:05:49.9793975Z Setting output ci-no-test-timeout=False 2025-12-04T09:05:49.9794344Z Setting output ci-no-td=False 2025-12-04T09:05:49.9794679Z Setting output ci-td-distributed=False 2025-12-04T09:05:49.9795047Z Setting output is-unstable=False 2025-12-04T09:05:49.9795392Z Setting output reenabled-issues= 2025-12-04T09:05:49.9821703Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:05:49.9845497Z Setting output is-test-matrix-empty=False 2025-12-04T09:05:49.9916888Z ##[group]Run echo "Filtered matrix:" 2025-12-04T09:05:49.9917346Z echo "Filtered matrix:" 2025-12-04T09:05:49.9943353Z echo "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" 2025-12-04T09:05:49.9968303Z  2025-12-04T09:05:49.9968523Z echo 2025-12-04T09:05:49.9968809Z echo "Is the current job unstable? False" 2025-12-04T09:05:49.9969147Z  2025-12-04T09:05:49.9969356Z echo 2025-12-04T09:05:49.9969620Z echo "Is keep-going label set? True" 2025-12-04T09:05:49.9969942Z  2025-12-04T09:05:49.9970151Z echo 2025-12-04T09:05:49.9970396Z echo "Reenabled issues? " 2025-12-04T09:05:49.9976558Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:49.9977007Z env: 2025-12-04T09:05:49.9977254Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:49.9994495Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:49.9994883Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:49.9995278Z ##[endgroup] 2025-12-04T09:05:50.0020765Z Filtered matrix: 2025-12-04T09:05:50.0052008Z {include: [{config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}]} 2025-12-04T09:05:50.0077242Z 2025-12-04T09:05:50.0077358Z Is the current job unstable? False 2025-12-04T09:05:50.0077570Z 2025-12-04T09:05:50.0077688Z Is keep-going label set? True 2025-12-04T09:05:50.0077877Z 2025-12-04T09:05:50.0077987Z Reenabled issues? 2025-12-04T09:05:50.0112042Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:50.0112577Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:50.0117871Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:50.0118268Z env: 2025-12-04T09:05:50.0118481Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:50.0118755Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:50.0119076Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:50.0119440Z JOB_TIMEOUT: 600 2025-12-04T09:05:50.0119681Z ##[endgroup] 2025-12-04T09:05:50.0167651Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:50.0168194Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:50.0168681Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:50.0174319Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:50.0174751Z env: 2025-12-04T09:05:50.0175000Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:50.0175298Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:50.0175674Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:50.0176092Z ##[endgroup] 2025-12-04T09:05:50.0273804Z ##[group]Run set -x 2025-12-04T09:05:50.0274144Z set -x 2025-12-04T09:05:50.0274377Z  2025-12-04T09:05:50.0274637Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T09:05:50.0275039Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T09:05:50.0275453Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T09:05:50.0275832Z  TEST_COMMAND=.ci/onnx/test.sh 2025-12-04T09:05:50.0276137Z else 2025-12-04T09:05:50.0276396Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:05:50.0276713Z fi 2025-12-04T09:05:50.0276926Z  2025-12-04T09:05:50.0277184Z # Leaving 1GB for the runner and other things 2025-12-04T09:05:50.0277799Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-12-04T09:05:50.0278933Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-12-04T09:05:50.0280038Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-12-04T09:05:50.0280670Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-12-04T09:05:50.0281159Z  2025-12-04T09:05:50.0281453Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:05:50.0281860Z  SHM_OPTS= 2025-12-04T09:05:50.0282145Z  JENKINS_USER= 2025-12-04T09:05:50.0282553Z  # ensure that docker container cleanly exits in 12 hours 2025-12-04T09:05:50.0283101Z  # if for some reason cleanup action doesn't stop container 2025-12-04T09:05:50.0283565Z  # when job is cancelled 2025-12-04T09:05:50.0284067Z  DOCKER_SHELL_CMD="sleep 12h" 2025-12-04T09:05:50.0284443Z  USED_IMAGE="${DOCKER_IMAGE_S390X}" 2025-12-04T09:05:50.0284802Z else 2025-12-04T09:05:50.0285091Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-12-04T09:05:50.0285475Z  JENKINS_USER="--user jenkins" 2025-12-04T09:05:50.0285847Z  DOCKER_SHELL_CMD= 2025-12-04T09:05:50.0286179Z  USED_IMAGE="${DOCKER_IMAGE}" 2025-12-04T09:05:50.0286521Z fi 2025-12-04T09:05:50.0286747Z  2025-12-04T09:05:50.0287139Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T09:05:50.0287764Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T09:05:50.0288468Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-12-04T09:05:50.0289100Z # shellcheck disable=SC2086,SC2090 2025-12-04T09:05:50.0289488Z container_name=$(docker run \ 2025-12-04T09:05:50.0289857Z  ${GPU_FLAG:-} \ 2025-12-04T09:05:50.0290195Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-12-04T09:05:50.0290597Z  -e BUILD_ENVIRONMENT \ 2025-12-04T09:05:50.0290942Z  -e PR_NUMBER \ 2025-12-04T09:05:50.0291344Z  -e GITHUB_ACTIONS \ 2025-12-04T09:05:50.0291659Z  -e GITHUB_REPOSITORY \ 2025-12-04T09:05:50.0291987Z  -e GITHUB_WORKFLOW \ 2025-12-04T09:05:50.0292288Z  -e GITHUB_JOB \ 2025-12-04T09:05:50.0292578Z  -e GITHUB_RUN_ID \ 2025-12-04T09:05:50.0292880Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T09:05:50.0293292Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T09:05:50.0293782Z  -e JOB_ID \ 2025-12-04T09:05:50.0294076Z  -e JOB_NAME \ 2025-12-04T09:05:50.0294398Z  -e BASE_SHA \ 2025-12-04T09:05:50.0294682Z  -e BRANCH \ 2025-12-04T09:05:50.0294965Z  -e SHA1 \ 2025-12-04T09:05:50.0295249Z  -e AWS_DEFAULT_REGION \ 2025-12-04T09:05:50.0295579Z  -e IN_WHEEL_TEST \ 2025-12-04T09:05:50.0295899Z  -e SHARD_NUMBER \ 2025-12-04T09:05:50.0296214Z  -e TEST_CONFIG \ 2025-12-04T09:05:50.0296520Z  -e NUM_TEST_SHARDS \ 2025-12-04T09:05:50.0297003Z  -e REENABLED_ISSUES \ 2025-12-04T09:05:50.0297353Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T09:05:50.0297715Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T09:05:50.0298046Z  -e TEST_SHOWLOCALS \ 2025-12-04T09:05:50.0298380Z  -e NO_TEST_TIMEOUT \ 2025-12-04T09:05:50.0298700Z  -e NO_TD \ 2025-12-04T09:05:50.0298978Z  -e TD_DISTRIBUTED \ 2025-12-04T09:05:50.0299300Z  -e PR_LABELS \ 2025-12-04T09:05:50.0299641Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T09:05:50.0300013Z  -e SCCACHE_BUCKET \ 2025-12-04T09:05:50.0300338Z  -e SCCACHE_REGION \ 2025-12-04T09:05:50.0300647Z  -e XLA_CUDA \ 2025-12-04T09:05:50.0300968Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-12-04T09:05:50.0301379Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T09:05:50.0301801Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T09:05:50.0302227Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-12-04T09:05:50.0302613Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T09:05:50.0302989Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-12-04T09:05:50.0303385Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-12-04T09:05:50.0303745Z  -e DASHBOARD_TAG \ 2025-12-04T09:05:50.0304077Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-12-04T09:05:50.0304498Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-12-04T09:05:50.0304986Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-12-04T09:05:50.0305667Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:05:50.0306075Z  --security-opt seccomp=unconfined \ 2025-12-04T09:05:50.0306498Z  --cap-add=SYS_PTRACE \ 2025-12-04T09:05:50.0306791Z  --ipc=host \ 2025-12-04T09:05:50.0307062Z  ${SHM_OPTS} \ 2025-12-04T09:05:50.0307325Z  --tty \ 2025-12-04T09:05:50.0307559Z  --detach \ 2025-12-04T09:05:50.0307845Z  --name="${container_name}" \ 2025-12-04T09:05:50.0308168Z  ${JENKINS_USER} \ 2025-12-04T09:05:50.0308527Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T09:05:50.0308931Z  -w /var/lib/jenkins/workspace \ 2025-12-04T09:05:50.0309258Z  "${USED_IMAGE}" \ 2025-12-04T09:05:50.0309542Z  ${DOCKER_SHELL_CMD} 2025-12-04T09:05:50.0309803Z ) 2025-12-04T09:05:50.0310145Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-12-04T09:05:50.0310569Z  2025-12-04T09:05:50.0310828Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:05:50.0311432Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-12-04T09:05:50.0311980Z fi 2025-12-04T09:05:50.0312187Z  2025-12-04T09:05:50.0312693Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-12-04T09:05:50.0318137Z shell: /usr/bin/bash -e {0} 2025-12-04T09:05:50.0318413Z env: 2025-12-04T09:05:50.0318620Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:50.0318896Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:50.0319220Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:50.0319667Z BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:50.0320040Z PR_NUMBER: 2025-12-04T09:05:50.0320296Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T09:05:50.0320605Z GITHUB_WORKFLOW: trunk 2025-12-04T09:05:50.0320870Z GITHUB_JOB: test 2025-12-04T09:05:50.0321117Z GITHUB_RUN_ID: 19922768520 2025-12-04T09:05:50.0321388Z GITHUB_RUN_NUMBER: 158165 2025-12-04T09:05:50.0321677Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T09:05:50.0321936Z JOB_ID: 57116084892 2025-12-04T09:05:50.0322520Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:50.0323238Z BRANCH: main 2025-12-04T09:05:50.0323519Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:50.0323928Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:50.0324282Z TEST_CONFIG: distributed 2025-12-04T09:05:50.0324556Z SHARD_NUMBER: 1 2025-12-04T09:05:50.0324796Z NUM_TEST_SHARDS: 3 2025-12-04T09:05:50.0325029Z EXTRA_FLAGS: 2025-12-04T09:05:50.0325272Z OP_BENCHMARK_TESTS: 2025-12-04T09:05:50.0325536Z REENABLED_ISSUES: 2025-12-04T09:05:50.0325790Z CONTINUE_THROUGH_ERROR: True 2025-12-04T09:05:50.0326084Z VERBOSE_TEST_LOGS: False 2025-12-04T09:05:50.0326363Z TEST_SHOWLOCALS: False 2025-12-04T09:05:50.0326632Z NO_TEST_TIMEOUT: False 2025-12-04T09:05:50.0326872Z NO_TD: False 2025-12-04T09:05:50.0327106Z TD_DISTRIBUTED: False 2025-12-04T09:05:50.0327430Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-12-04T09:05:50.0327791Z SCCACHE_REGION: us-east-1 2025-12-04T09:05:50.0328058Z SHM_SIZE: 2g 2025-12-04T09:05:50.0328872Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:50.0330353Z DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:50.0331241Z XLA_CUDA: 2025-12-04T09:05:50.0331614Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:05:50.0332092Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T09:05:50.0332417Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T09:05:50.0332732Z DASHBOARD_TAG: 2025-12-04T09:05:50.0333293Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-12-04T09:05:50.0334015Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T09:05:50.0334596Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-12-04T09:05:50.0335192Z ARTIFACTS_FILE_SUFFIX: test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T09:05:50.0335809Z ##[endgroup] 2025-12-04T09:05:50.0359268Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2025-12-04T09:05:50.0359749Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *onnx* ]] 2025-12-04T09:05:50.0360175Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:05:50.0364058Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-12-04T09:05:50.0384045Z + TOTAL_AVAILABLE_MEMORY_IN_GB='185.682 ' 2025-12-04T09:05:50.0384459Z + TOTAL_MEMORY_WITH_SWAP=188 2025-12-04T09:05:50.0384865Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:05:50.0385281Z + SHM_OPTS=--shm-size=2g 2025-12-04T09:05:50.0385596Z + JENKINS_USER='--user jenkins' 2025-12-04T09:05:50.0385917Z + DOCKER_SHELL_CMD= 2025-12-04T09:05:50.0386826Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:50.0394009Z +++ nproc --ignore=2 2025-12-04T09:05:50.0428722Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=46 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=185g --memory-swap=188g --env-file=/tmp/github_env_19922768520 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:06:03.1634130Z + container_name=f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T09:06:03.1634970Z + echo DOCKER_CONTAINER_ID=f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T09:06:03.1635650Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:06:03.1638471Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:06:03.1640928Z + docker exec -t f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-12-04T09:06:03.6434755Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f) 2025-12-04T09:06:04.7834598Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T09:06:04.7836229Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T09:06:04.7841640Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T09:06:04.7847419Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T09:06:04.7851678Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T09:06:04.7857936Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T09:06:04.7879330Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0) 2025-12-04T09:06:04.8369945Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4) 2025-12-04T09:06:04.8397238Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T09:06:04.8468492Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T09:06:05.3726482Z Installing collected packages: torch 2025-12-04T09:06:19.1582396Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T09:06:19.2209786Z + export TERM=vt100 2025-12-04T09:06:19.2210147Z + TERM=vt100 2025-12-04T09:06:19.2210426Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:06:19.2217146Z + source .ci/pytorch/common.sh 2025-12-04T09:06:19.2220273Z +++ dirname .ci/pytorch/common.sh 2025-12-04T09:06:19.2227354Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T09:06:19.2228635Z +++ declare -f -t trap_add 2025-12-04T09:06:19.2234098Z ++ set -ex -o pipefail 2025-12-04T09:06:19.2234444Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:06:19.2234850Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T09:06:19.2239140Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:06:19.2245475Z + source .ci/pytorch/common-build.sh 2025-12-04T09:06:19.2247095Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 != *win-* ]] 2025-12-04T09:06:19.2254498Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T09:06:19.2262068Z +++ cd .ci/pytorch 2025-12-04T09:06:19.2262362Z +++ pwd -P 2025-12-04T09:06:19.2264092Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-12-04T09:06:19.2264590Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-pch* ]] 2025-12-04T09:06:19.2264988Z ++ which sccache 2025-12-04T09:06:19.2277527Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-12-04T09:06:19.2277957Z ++ sccache --stop-server 2025-12-04T09:06:19.2304995Z ++ true 2025-12-04T09:06:19.2305439Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T09:06:19.2314095Z ++ trap_add sccache_epilogue EXIT 2025-12-04T09:06:19.2314482Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T09:06:19.2314786Z ++ shift 2025-12-04T09:06:19.2315033Z ++ for trap_add_name in "$@" 2025-12-04T09:06:19.2321139Z ++++ trap -p EXIT 2025-12-04T09:06:19.2323462Z +++ eval 'extract_trap_cmd ' 2025-12-04T09:06:19.2323767Z ++++ extract_trap_cmd 2025-12-04T09:06:19.2324053Z ++++ printf '%s\n' '' 2025-12-04T09:06:19.2324345Z +++ printf '%s\n' sccache_epilogue 2025-12-04T09:06:19.2325662Z ++ trap -- ' 2025-12-04T09:06:19.2325922Z sccache_epilogue' EXIT 2025-12-04T09:06:19.2326249Z ++ [[ -n 1 ]] 2025-12-04T09:06:19.2326675Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-12-04T09:06:19.2327364Z Skipping sccache server initialization, setting environment variables 2025-12-04T09:06:19.2327878Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:19.2328194Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:19.2328595Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:19.2329101Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:19.2337691Z ++ export RUST_LOG=sccache::server=error 2025-12-04T09:06:19.2338105Z ++ RUST_LOG=sccache::server=error 2025-12-04T09:06:19.2338676Z ++ sccache --zero-stats 2025-12-04T09:06:19.3574222Z Statistics zeroed. 2025-12-04T09:06:19.3577817Z ++ which ccache 2025-12-04T09:06:19.3594254Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *rocm* ]] 2025-12-04T09:06:19.3594804Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *s390x* ]] 2025-12-04T09:06:19.3595241Z + [[ -d /var/lib/jenkins/workspace ]] 2025-12-04T09:06:19.3595619Z ++ stat -c %u /var/lib/jenkins/workspace 2025-12-04T09:06:19.3612206Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-12-04T09:06:19.3613040Z + trap_add cleanup_workspace EXIT 2025-12-04T09:06:19.3613451Z + trap_add_cmd=cleanup_workspace 2025-12-04T09:06:19.3613961Z + shift 2025-12-04T09:06:19.3614204Z + for trap_add_name in "$@" 2025-12-04T09:06:19.3615558Z +++ trap -p EXIT 2025-12-04T09:06:19.3617851Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-12-04T09:06:19.3618237Z sccache_epilogue'\'' EXIT' 2025-12-04T09:06:19.3618555Z +++ extract_trap_cmd trap -- ' 2025-12-04T09:06:19.3618885Z sccache_epilogue' EXIT 2025-12-04T09:06:19.3619193Z +++ printf '%s\n' ' 2025-12-04T09:06:19.3619565Z sccache_epilogue' 2025-12-04T09:06:19.3619848Z ++ printf '%s\n' cleanup_workspace 2025-12-04T09:06:19.3620246Z + trap -- ' 2025-12-04T09:06:19.3620475Z sccache_epilogue 2025-12-04T09:06:19.3620750Z cleanup_workspace' EXIT 2025-12-04T09:06:19.3621099Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-12-04T09:06:20.0150122Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-12-04T09:06:20.0170724Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:06:20.0171378Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))' 2025-12-04T09:06:20.4773402Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:20.4774342Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']' 2025-12-04T09:06:20.4774890Z +++ realpath .ci/pytorch/test.sh 2025-12-04T09:06:20.4782736Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh 2025-12-04T09:06:20.4790301Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch 2025-12-04T09:06:20.4791074Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:20.4792032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace 2025-12-04T09:06:20.4792552Z + patch -p4 2025-12-04T09:06:20.4808115Z patching file cudadrv/driver.py 2025-12-04T09:06:20.4808548Z Hunk #1 succeeded at 357 (offset -8 lines). 2025-12-04T09:06:20.4816582Z + popd 2025-12-04T09:06:20.4816830Z ~/workspace 2025-12-04T09:06:20.4817106Z + echo 'Environment variables:' 2025-12-04T09:06:20.4817447Z Environment variables: 2025-12-04T09:06:20.4817728Z + env 2025-12-04T09:06:20.4825861Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:06:20.4826409Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:06:20.4826779Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:06:20.4827424Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:06:20.4827777Z HOSTNAME=f2da02c9e7d7 2025-12-04T09:06:20.4828446Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.4829153Z GITHUB_ACTION=__run_3 2025-12-04T09:06:20.4829460Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:06:20.4829801Z GITHUB_RUN_NUMBER=158165 2025-12-04T09:06:20.4830081Z TEST_CONFIG=distributed 2025-12-04T09:06:20.4830382Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:06:20.4830748Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:06:20.4831100Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:20.4831526Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:06:20.4831859Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:06:20.4832174Z GITHUB_REF_TYPE=branch 2025-12-04T09:06:20.4832493Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.4832871Z XLA_CUDA= 2025-12-04T09:06:20.4833126Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:06:20.4833579Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:06:20.4834276Z *** 2025-12-04T09:06:20.4834522Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:06:20.4834835Z GITHUB_ACTIONS=true 2025-12-04T09:06:20.4835102Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:06:20.4835491Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:20.4835977Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.4836405Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.4836984Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main 2025-12-04T09:06:20.4837519Z UCC_HOME=/usr 2025-12-04T09:06:20.4837772Z VERBOSE_TEST_LOGS=False 2025-12-04T09:06:20.4838046Z GITHUB_REF=refs/heads/main 2025-12-04T09:06:20.4838339Z SHARD_NUMBER=1 2025-12-04T09:06:20.4838603Z GITHUB_REF_PROTECTED=true 2025-12-04T09:06:20.4838899Z HOME=/var/lib/jenkins 2025-12-04T09:06:20.4839199Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:06:20.4839570Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:06:20.4839956Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:06:20.4840329Z USE_SYSTEM_NCCL=1 2025-12-04T09:06:20.4840583Z NUM_TEST_SHARDS=3 2025-12-04T09:06:20.4840831Z UCX_HOME=/usr 2025-12-04T09:06:20.4841461Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.4842545Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:06:20.4843589Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.4844507Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:06:20.4845069Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:06:20.4845371Z DASHBOARD_TAG= 2025-12-04T09:06:20.4845633Z GITHUB_RUN_ID=19922768520 2025-12-04T09:06:20.4845917Z INSTALLED_OPENBLAS= 2025-12-04T09:06:20.4846617Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.4847391Z GITHUB_ACTOR=huydhn 2025-12-04T09:06:20.4847638Z PR_NUMBER= 2025-12-04T09:06:20.4847882Z DESIRED_CUDA=12.8.1 2025-12-04T09:06:20.4848152Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:06:20.4848513Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:06:20.4848906Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:06:20.4849305Z TERM=vt100 2025-12-04T09:06:20.4849551Z INSTALLED_VISION=yes 2025-12-04T09:06:20.4849805Z BRANCH=main 2025-12-04T09:06:20.4850055Z SCCACHE_REGION=us-east-1 2025-12-04T09:06:20.4850363Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:06:20.4850670Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:06:20.4850968Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:06:20.4851558Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:06:20.4852211Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:06:20.4852614Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:06:20.4853146Z REENABLED_ISSUES= 2025-12-04T09:06:20.4853563Z DOCS= 2025-12-04T09:06:20.4853789Z SHLVL=1 2025-12-04T09:06:20.4854065Z MAX_JOBS=46 2025-12-04T09:06:20.4854297Z GITHUB_ACTOR_ID=475357 2025-12-04T09:06:20.4854694Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.4855144Z GITHUB_REF_NAME=main 2025-12-04T09:06:20.4855573Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:06:20.4856071Z GITHUB_JOB=test 2025-12-04T09:06:20.4856335Z NO_TEST_TIMEOUT=False 2025-12-04T09:06:20.4856621Z TD_DISTRIBUTED=False 2025-12-04T09:06:20.4856913Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:06:20.4857264Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:06:20.4857570Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:06:20.4857863Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:06:20.4858776Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:20.4859818Z GITHUB_BASE_REF= 2025-12-04T09:06:20.4860069Z INSTALLED_ACL= 2025-12-04T09:06:20.4860601Z ARTIFACTS_FILE_SUFFIX=test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T09:06:20.4861207Z CI=true 2025-12-04T09:06:20.4861456Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:06:20.4861822Z RUST_LOG=sccache::server=error 2025-12-04T09:06:20.4862138Z JOB_ID=57116084892 2025-12-04T09:06:20.4862399Z GITHUB_HEAD_REF= 2025-12-04T09:06:20.4862651Z GITHUB_ACTION_REF= 2025-12-04T09:06:20.4862983Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:06:20.4863401Z TEST_SHOWLOCALS=False 2025-12-04T09:06:20.4863682Z GITHUB_WORKFLOW=trunk 2025-12-04T09:06:20.4863985Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:06:20.4864832Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.4865546Z NO_TD=False 2025-12-04T09:06:20.4865806Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:06:20.4866149Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:06:20.4866490Z _=/usr/bin/env 2025-12-04T09:06:20.4866891Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:20.4867483Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T09:06:20.4968033Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-12-04T09:06:20.4969220Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-12-04T09:06:20.4969945Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-12-04T09:06:20.4970651Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-12-04T09:06:20.4971155Z + BUILD_DIR=build 2025-12-04T09:06:20.4971438Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T09:06:20.4971772Z + BUILD_BIN_DIR=build/bin 2025-12-04T09:06:20.4972048Z + SHARD_NUMBER=1 2025-12-04T09:06:20.4972315Z + NUM_TEST_SHARDS=3 2025-12-04T09:06:20.4972607Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:06:20.4973117Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:06:20.4973624Z + export VALGRIND=ON 2025-12-04T09:06:20.4973904Z + VALGRIND=ON 2025-12-04T09:06:20.4974226Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *clang9* ]] 2025-12-04T09:06:20.4976706Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:06:20.4977144Z + detect_cuda_arch 2025-12-04T09:06:20.4977465Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:06:20.4977861Z + command -v nvidia-smi 2025-12-04T09:06:20.4978155Z /usr/bin/nvidia-smi 2025-12-04T09:06:20.4978499Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-12-04T09:06:20.4979313Z ++ tail -n 1 2025-12-04T09:06:20.5474700Z + TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:06:20.5475188Z + export TORCH_CUDA_ARCH_LIST 2025-12-04T09:06:20.5475976Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *s390x* ]] 2025-12-04T09:06:20.5476450Z + [[ 0 == \1 ]] 2025-12-04T09:06:20.5476690Z + [[ True == \1 ]] 2025-12-04T09:06:20.5477019Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *bazel* ]] 2025-12-04T09:06:20.5477457Z ++ realpath build/custom_test_artifacts 2025-12-04T09:06:20.5483335Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-12-04T09:06:20.5483910Z + [[ -n '' ]] 2025-12-04T09:06:20.5484199Z + echo 'Environment variables' 2025-12-04T09:06:20.5484530Z Environment variables 2025-12-04T09:06:20.5484793Z + env 2025-12-04T09:06:20.5490382Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:06:20.5490916Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:06:20.5491309Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:06:20.5491949Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:06:20.5492303Z HOSTNAME=f2da02c9e7d7 2025-12-04T09:06:20.5493066Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.5493801Z GITHUB_ACTION=__run_3 2025-12-04T09:06:20.5494110Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:06:20.5494674Z GITHUB_RUN_NUMBER=158165 2025-12-04T09:06:20.5494982Z TEST_CONFIG=distributed 2025-12-04T09:06:20.5495283Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:06:20.5495664Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:06:20.5496040Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:20.5496489Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:06:20.5496838Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:06:20.5497165Z GITHUB_REF_TYPE=branch 2025-12-04T09:06:20.5497446Z TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:06:20.5497806Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.5498201Z XLA_CUDA= 2025-12-04T09:06:20.5498449Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:06:20.5498979Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:06:20.5499344Z *** 2025-12-04T09:06:20.5499595Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:06:20.5499902Z GITHUB_ACTIONS=true 2025-12-04T09:06:20.5500198Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:06:20.5500595Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:20.5501046Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.5501489Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.5502087Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main 2025-12-04T09:06:20.5502637Z UCC_HOME=/usr 2025-12-04T09:06:20.5502888Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:06:20.5503205Z VERBOSE_TEST_LOGS=False 2025-12-04T09:06:20.5503502Z GITHUB_REF=refs/heads/main 2025-12-04T09:06:20.5503787Z SHARD_NUMBER=1 2025-12-04T09:06:20.5504053Z GITHUB_REF_PROTECTED=true 2025-12-04T09:06:20.5504350Z HOME=/var/lib/jenkins 2025-12-04T09:06:20.5504655Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:06:20.5505148Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:06:20.5505533Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:06:20.5505902Z USE_SYSTEM_NCCL=1 2025-12-04T09:06:20.5506158Z NUM_TEST_SHARDS=3 2025-12-04T09:06:20.5506407Z UCX_HOME=/usr 2025-12-04T09:06:20.5507084Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.5508172Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:06:20.5509328Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.5510253Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:06:20.5510829Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:06:20.5511112Z DASHBOARD_TAG= 2025-12-04T09:06:20.5511369Z GITHUB_RUN_ID=19922768520 2025-12-04T09:06:20.5511660Z INSTALLED_OPENBLAS= 2025-12-04T09:06:20.5512347Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.5513119Z GITHUB_ACTOR=huydhn 2025-12-04T09:06:20.5513380Z PR_NUMBER= 2025-12-04T09:06:20.5513617Z DESIRED_CUDA=12.8.1 2025-12-04T09:06:20.5513876Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:06:20.5514140Z VALGRIND=ON 2025-12-04T09:06:20.5514393Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:06:20.5514760Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:06:20.5515153Z TERM=vt100 2025-12-04T09:06:20.5515392Z INSTALLED_VISION=yes 2025-12-04T09:06:20.5515647Z BRANCH=main 2025-12-04T09:06:20.5515890Z SCCACHE_REGION=us-east-1 2025-12-04T09:06:20.5516191Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:06:20.5516497Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:06:20.5516790Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:06:20.5517378Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:06:20.5518030Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:06:20.5518430Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:06:20.5518817Z REENABLED_ISSUES= 2025-12-04T09:06:20.5519064Z DOCS= 2025-12-04T09:06:20.5519269Z SHLVL=1 2025-12-04T09:06:20.5519488Z MAX_JOBS=46 2025-12-04T09:06:20.5519805Z GITHUB_ACTOR_ID=475357 2025-12-04T09:06:20.5520174Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:20.5520612Z GITHUB_REF_NAME=main 2025-12-04T09:06:20.5521039Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:06:20.5521510Z GITHUB_JOB=test 2025-12-04T09:06:20.5521765Z NO_TEST_TIMEOUT=False 2025-12-04T09:06:20.5522039Z TD_DISTRIBUTED=False 2025-12-04T09:06:20.5522321Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:06:20.5522657Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:06:20.5522951Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:06:20.5523236Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:06:20.5524113Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:20.5525031Z GITHUB_BASE_REF= 2025-12-04T09:06:20.5525290Z INSTALLED_ACL= 2025-12-04T09:06:20.5525789Z ARTIFACTS_FILE_SUFFIX=test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T09:06:20.5526388Z CI=true 2025-12-04T09:06:20.5526636Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:06:20.5526984Z RUST_LOG=sccache::server=error 2025-12-04T09:06:20.5527291Z JOB_ID=57116084892 2025-12-04T09:06:20.5527548Z GITHUB_HEAD_REF= 2025-12-04T09:06:20.5527791Z GITHUB_ACTION_REF= 2025-12-04T09:06:20.5528111Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:06:20.5528510Z TEST_SHOWLOCALS=False 2025-12-04T09:06:20.5528775Z GITHUB_WORKFLOW=trunk 2025-12-04T09:06:20.5529064Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:06:20.5529764Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_28887d98-6b20-42c1-92eb-104db062c4ed 2025-12-04T09:06:20.5530472Z NO_TD=False 2025-12-04T09:06:20.5530729Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:06:20.5531076Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:06:20.5531583Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:20.5532057Z _=/usr/bin/env 2025-12-04T09:06:20.5532314Z + echo 'Testing pytorch' 2025-12-04T09:06:20.5532605Z Testing pytorch 2025-12-04T09:06:20.5532859Z + export LANG=C.UTF-8 2025-12-04T09:06:20.5533227Z + LANG=C.UTF-8 2025-12-04T09:06:20.5533655Z + PR_NUMBER= 2025-12-04T09:06:20.5533998Z + [[ distributed == \d\e\f\a\u\l\t ]] 2025-12-04T09:06:20.5534378Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T09:06:20.5534808Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:06:20.5535205Z + [[ distributed == \s\l\o\w ]] 2025-12-04T09:06:20.5535628Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *slow-gradcheck* ]] 2025-12-04T09:06:20.5536133Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:06:20.5536562Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:06:20.5536966Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:06:20.5537334Z + [[ distributed == *crossref* ]] 2025-12-04T09:06:20.5537726Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:06:20.5538169Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:06:20.5538628Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:06:20.5539046Z + pip_install ninja==1.10.2 2025-12-04T09:06:20.5539457Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T09:06:20.5539999Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T09:06:20.9412386Z Collecting ninja==1.10.2 2025-12-04T09:06:20.9675178Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T09:06:20.9770318Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T09:06:21.3700311Z Installing collected packages: ninja 2025-12-04T09:06:21.3715278Z Attempting uninstall: ninja 2025-12-04T09:06:21.3715676Z Found existing installation: ninja 1.11.1.4 2025-12-04T09:06:21.3734527Z Uninstalling ninja-1.11.1.4: 2025-12-04T09:06:21.3801597Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T09:06:21.4123528Z Successfully installed ninja-1.10.2 2025-12-04T09:06:21.4659984Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:21.4661873Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:21.4663074Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:06:21.4663557Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *asan* ]] 2025-12-04T09:06:21.4664030Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-debug* ]] 2025-12-04T09:06:21.4664492Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:06:21.4665155Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass' 2025-12-04T09:06:21.4666070Z We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass 2025-12-04T09:06:21.4666617Z + cd test 2025-12-04T09:06:21.4667009Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T09:06:23.1702879Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T09:06:23.1703386Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T09:06:23.1703851Z + [[ distributed == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T09:06:23.1708002Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T09:06:23.1708897Z + [[ distributed == *pr_time_benchmarks* ]] 2025-12-04T09:06:23.1709334Z + [[ distributed == *dynamo_eager* ]] 2025-12-04T09:06:23.1709684Z + [[ distributed == *aot_eager* ]] 2025-12-04T09:06:23.1710040Z + [[ distributed == *aot_inductor* ]] 2025-12-04T09:06:23.1710433Z + [[ distributed == *max_autotune_inductor* ]] 2025-12-04T09:06:23.1710805Z + [[ distributed == *inductor* ]] 2025-12-04T09:06:23.1711145Z + [[ distributed == *dynamic* ]] 2025-12-04T09:06:23.1711499Z + [[ distributed == *cpu* ]] 2025-12-04T09:06:23.1711801Z + [[ distributed == *xpu* ]] 2025-12-04T09:06:23.1712149Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T09:06:23.1738025Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:06:23.1738867Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-bazel-* ]] 2025-12-04T09:06:23.1740597Z + cd test 2025-12-04T09:06:23.1741234Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T09:06:25.5221056Z PyTorch built with: 2025-12-04T09:06:25.5221394Z - GCC 11.4 2025-12-04T09:06:25.5221663Z - C++ Version: 201703 2025-12-04T09:06:25.5222343Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:06:25.5223197Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:06:25.5223726Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:06:25.5224136Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T09:06:25.5224561Z - NNPACK is enabled 2025-12-04T09:06:25.5224857Z - CPU capability usage: AVX512 2025-12-04T09:06:25.5225196Z - CUDA Runtime 12.8 2025-12-04T09:06:25.5225878Z - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_89,code=sm_89 2025-12-04T09:06:25.5226508Z - CuDNN 91.0.2 (built against CUDA 12.9) 2025-12-04T09:06:25.5232106Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=35b7a9a26c5923d98aebaa41a031dae21788a9ee, CUDA_VERSION=12.8, CUDNN_VERSION=9.10.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T09:06:25.5237828Z 2025-12-04T09:06:25.9888533Z + cd test 2025-12-04T09:06:25.9889008Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T09:06:27.4134110Z ATen/Parallel: 2025-12-04T09:06:27.4134538Z at::get_num_threads() : 24 2025-12-04T09:06:27.4134921Z at::get_num_interop_threads() : 24 2025-12-04T09:06:27.4135289Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:06:27.4135639Z omp_get_max_threads() : 24 2025-12-04T09:06:27.4136344Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:06:27.4137036Z mkl_get_max_threads() : 24 2025-12-04T09:06:27.4137505Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:06:27.4138030Z std::thread::hardware_concurrency() : 48 2025-12-04T09:06:27.4138407Z Environment variables: 2025-12-04T09:06:27.4138695Z OMP_NUM_THREADS : [not set] 2025-12-04T09:06:27.4139014Z MKL_NUM_THREADS : [not set] 2025-12-04T09:06:27.4139339Z ATen parallel backend: OpenMP 2025-12-04T09:06:27.4139551Z 2025-12-04T09:06:27.6871719Z + [[ distributed == *numpy_2* ]] 2025-12-04T09:06:27.6872235Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:06:27.6872662Z + [[ distributed == *backward* ]] 2025-12-04T09:06:27.6873045Z + [[ distributed == *libtorch_agnostic_targetting* ]] 2025-12-04T09:06:27.6873456Z + [[ distributed == *xla* ]] 2025-12-04T09:06:27.6873782Z + [[ distributed == *vllm* ]] 2025-12-04T09:06:27.6874116Z + [[ distributed == *executorch* ]] 2025-12-04T09:06:27.6874476Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T09:06:27.6874860Z + [[ distributed == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T09:06:27.6875605Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:06:27.6876043Z + [[ distributed == distributed ]] 2025-12-04T09:06:27.6876370Z + test_distributed 2025-12-04T09:06:27.6876666Z + echo 'Testing distributed python tests' 2025-12-04T09:06:27.6877029Z Testing distributed python tests 2025-12-04T09:06:27.6877500Z + python test/run_test.py --distributed-tests --shard 1 3 --verbose 2025-12-04T09:06:33.2345782Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2025-12-04T09:06:33.2871280Z Ignoring disabled issues: [''] 2025-12-04T09:06:33.2978371Z Found test times from artifacts 2025-12-04T09:06:33.3402006Z Found test times from artifacts 2025-12-04T09:06:33.3412995Z Running all tests 2025-12-04T09:06:33.3583207Z Running parallel tests on 1 processes 2025-12-04T09:06:33.3587608Z Name: tests to run (est. time: 133.38min) 2025-12-04T09:06:33.3587982Z Serial tests (85): 2025-12-04T09:06:33.3588353Z distributed/test_dynamo_distributed 1/2 2025-12-04T09:06:33.3588759Z distributed/fsdp/test_fsdp_apply 1/1 2025-12-04T09:06:33.3589180Z distributed/fsdp/test_fsdp_multiple_wrapping 1/1 2025-12-04T09:06:33.3589629Z distributed/fsdp/test_fsdp_fine_tune 1/1 2025-12-04T09:06:33.3590067Z distributed/fsdp/test_fsdp_dtensor_state_dict 1/1 2025-12-04T09:06:33.3590599Z distributed/fsdp/test_fsdp_core 1/2 2025-12-04T09:06:33.3591048Z distributed/algorithms/ddp_comm_hooks/test_ddp_hooks 1/1 2025-12-04T09:06:33.3591512Z distributed/tensor/test_op_schema 1/1 2025-12-04T09:06:33.3591909Z distributed/checkpoint/test_nested_dict 1/1 2025-12-04T09:06:33.3592384Z distributed/checkpoint/test_consolidate_hf_safetensors 1/1 2025-12-04T09:06:33.3593195Z distributed/checkpoint/_experimental/test_barriers 1/1 2025-12-04T09:06:33.3593670Z distributed/pipelining/test_transformer 1/1 2025-12-04T09:06:33.3594105Z distributed/flight_recorder/test_fr_analysis 1/1 2025-12-04T09:06:33.3594547Z distributed/_composable/test_contract 1/1 2025-12-04T09:06:33.3594964Z distributed/checkpoint/test_dedup_tensors 1/1 2025-12-04T09:06:33.3595363Z distributed/pipelining/test_pipe 1/1 2025-12-04T09:06:33.3595750Z distributed/pipelining/test_backward 1/1 2025-12-04T09:06:33.3596132Z distributed/test_nvshmem_triton 1/1 2025-12-04T09:06:33.3596499Z distributed/tensor/test_dtensor 1/1 2025-12-04T09:06:33.3596849Z distributed/test_p2p_ipc 1/1 2025-12-04T09:06:33.3597204Z distributed/tensor/test_common_rules 1/1 2025-12-04T09:06:33.3597630Z distributed/checkpoint/test_hf_safetensor_e2e 1/1 2025-12-04T09:06:33.3598039Z distributed/tensor/test_dynamic 1/1 2025-12-04T09:06:33.3598429Z distributed/checkpoint/e2e/test_fsdp_ep 1/1 2025-12-04T09:06:33.3598849Z distributed/pipelining/test_unflatten 1/1 2025-12-04T09:06:33.3599248Z distributed/tensor/test_dtensor_testbase 1/1 2025-12-04T09:06:33.3599664Z distributed/tensor/test_redistribute 1/2 2025-12-04T09:06:33.3600053Z distributed/test_nvshmem 1/1 2025-12-04T09:06:33.3600392Z distributed/tensor/test_attention 1/1 2025-12-04T09:06:33.3600793Z distributed/tensor/test_convolution_ops 1/1 2025-12-04T09:06:33.3601216Z distributed/checkpoint/fsdp/test_fsdp_dsd 1/1 2025-12-04T09:06:33.3601646Z distributed/checkpoint/test_save_load_api 1/1 2025-12-04T09:06:33.3602088Z distributed/tensor/debug/test_comm_mode_features 1/1 2025-12-04T09:06:33.3602524Z distributed/tensor/test_dtensor_ops 1/1 2025-12-04T09:06:33.3602892Z distributed/test_debug 1/1 2025-12-04T09:06:33.3603229Z distributed/test_overlap_bucketing_unit 1/1 2025-12-04T09:06:33.3603739Z distributed/checkpoint/_experimental/test_checkpoint_writer 1/1 2025-12-04T09:06:33.3604332Z distributed/checkpoint/_experimental/test_checkpointer 1/1 2025-12-04T09:06:33.3604793Z distributed/tensor/test_init 1/1 2025-12-04T09:06:33.3605175Z distributed/_composable/test_checkpoint 1/1 2025-12-04T09:06:33.3605595Z distributed/_tools/test_fsdp2_mem_tracker 1/1 2025-12-04T09:06:33.3606224Z distributed/_composable/test_replicate_mixed_precision 1/1 2025-12-04T09:06:33.3606716Z distributed/checkpoint/e2e/test_fine_tuning 1/1 2025-12-04T09:06:33.3607137Z distributed/tensor/test_matrix_ops 1/1 2025-12-04T09:06:33.3607527Z distributed/tensor/test_optimizers 1/1 2025-12-04T09:06:33.3607892Z distributed/test_symmetric_memory 1/1 2025-12-04T09:06:33.3608290Z distributed/_tools/test_runtime_estimator 1/1 2025-12-04T09:06:33.3608768Z distributed/_composable/test_replicate_with_compiler 1/1 2025-12-04T09:06:33.3609286Z distributed/_composable/fsdp/test_fully_shard_autograd 1/1 2025-12-04T09:06:33.3609880Z distributed/_composable/test_composability/test_2d_composability 1/1 2025-12-04T09:06:33.3610417Z distributed/fsdp/test_fsdp_optim_state 1/1 2025-12-04T09:06:33.3610802Z distributed/test_c10d_logger 1/1 2025-12-04T09:06:33.3611195Z distributed/_composable/test_replicate_training 1/1 2025-12-04T09:06:33.3611695Z distributed/optim/test_apply_optimizer_in_backward 1/1 2025-12-04T09:06:33.3612142Z distributed/rpc/test_share_memory 1/1 2025-12-04T09:06:33.3612512Z distributed/tensor/test_op_strategy 1/1 2025-12-04T09:06:33.3612989Z distributed/fsdp/test_fsdp_grad_acc 1/1 2025-12-04T09:06:33.3613592Z distributed/checkpoint/test_state_dict_stager 1/1 2025-12-04T09:06:33.3614051Z distributed/fsdp/test_fsdp_freezing_weights 1/1 2025-12-04T09:06:33.3614542Z distributed/_composable/fsdp/test_fully_shard_init 1/1 2025-12-04T09:06:33.3615027Z distributed/fsdp/test_fsdp_flatten_params 1/1 2025-12-04T09:06:33.3615453Z distributed/test_distributed_spawn 3/9 2025-12-04T09:06:33.3615834Z distributed/test_distributed_spawn 6/9 2025-12-04T09:06:33.3616305Z distributed/test_distributed_spawn 9/9 2025-12-04T09:06:33.3616694Z distributed/test_composability 1/1 2025-12-04T09:06:33.3617066Z distributed/test_multi_threaded_pg 1/1 2025-12-04T09:06:33.3617558Z distributed/_composable/fsdp/test_fully_shard_extensions 1/1 2025-12-04T09:06:33.3618045Z distributed/fsdp/test_wrap 1/1 2025-12-04T09:06:33.3618417Z distributed/fsdp/test_fsdp_hybrid_shard 1/1 2025-12-04T09:06:33.3618904Z distributed/_composable/fsdp/test_fully_shard_training 1/1 2025-12-04T09:06:33.3619408Z distributed/rpc/cuda/test_tensorpipe_agent 1/2 2025-12-04T09:06:33.3619892Z distributed/optim/test_zero_redundancy_optimizer 1/1 2025-12-04T09:06:33.3620337Z distributed/rpc/test_tensorpipe_agent 1/1 2025-12-04T09:06:33.3620731Z distributed/test_c10d_gloo 2/2 2025-12-04T09:06:33.3621082Z distributed/test_launcher 1/1 2025-12-04T09:06:33.3621416Z distributed/test_store 1/1 2025-12-04T09:06:33.3621749Z distributed/test_c10d_nccl 1/3 2025-12-04T09:06:33.3622116Z distributed/elastic/events/lib_test 1/1 2025-12-04T09:06:33.3622502Z distributed/elastic/metrics/api_test 1/1 2025-12-04T09:06:33.3622897Z distributed/elastic/timer/api_test 1/1 2025-12-04T09:06:33.3623333Z distributed/elastic/timer/local_timer_example 1/1 2025-12-04T09:06:33.3623779Z distributed/elastic/timer/local_timer_test 1/1 2025-12-04T09:06:33.3624235Z distributed/elastic/utils/distributed_test 1/1 2025-12-04T09:06:33.3624670Z distributed/elastic/utils/logging_test 1/1 2025-12-04T09:06:33.3625078Z distributed/elastic/utils/util_test 1/1 2025-12-04T09:06:33.3625537Z Parallel tests (0): 2025-12-04T09:06:33.3625834Z Name: excluded (est. time: 0.0min) 2025-12-04T09:06:33.3626156Z Serial tests (0): 2025-12-04T09:06:33.3626409Z Parallel tests (0): 2025-12-04T09:06:33.3626934Z Running distributed/test_dynamo_distributed 1/2 ... [2025-12-04 09:06:33.359501][819.461634414] 2025-12-04T09:06:33.3627631Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:06:33.3628919Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_dynamo_distributed.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:06:33.359849] 2025-12-04T09:13:04.3911296Z 2025-12-04T09:13:04.3912487Z distributed/test_dynamo_distributed 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_dynamo_distributed_1.2_812e4aab4ac83592_.log 2025-12-04T09:13:04.3931496Z Running 38 items in this shard: test/distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_call_method_forward, test/distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_ddp_optimizer_inductor_strides_dont_specialize, test/distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_hf_bert_ddp_aot_eager, test/distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_hf_bert_ddp_inductor, test/distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_issue90375, test/distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_unbacked_symbol_splitting_no_binding, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_automatic_dynamic_scalar, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_automatic_dynamic_speculation_divergence, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_automatic_dynamic_tensor, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_dim_mismatch, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_scalar_missing_source, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_type_mismatch, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_ddp_activation_checkpointing, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_ddp_baseline_aot_eager_multiprocess, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_aot_eager, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_unspecialized_forced_getattr_inline, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_unspecialized_forced_getattr_no_inline, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_get_pg_attr, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_guard_collective, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_ddp_aot_eager_static_graph, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_ddp_inductor, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_ddp_inductor_static_graph, test/distributed/test_dynamo_distributed.py::TestMultiProc::test_multiproc_autotune, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_compiled_flex_attention_full_model_ddp, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_ddp_baseline_inductor, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_fsdp_dup_tensors_diff_source, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_fsdp_dup_tensors_same_source, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_fsdp_orig_params_assert, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_fsdp_skip_guards, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_fsdp_skip_register_attr_or_module, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_graph_split, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_graph_split_ctx_manager, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_graph_split_inductor, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_graph_split_inductor_layout_optimizations_inference, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_graph_split_inductor_transpose, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_higher_order_op, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_ignored_parameters, test/distributed/test_dynamo_distributed.py::TestSingleProc::test_no_split 2025-12-04T09:13:04.3951020Z 2025-12-04T09:13:04.3951561Z Finished distributed/test_dynamo_distributed 1/2 ... [2025-12-04 09:13:04.390605][1210.492734693], took 6.52min 2025-12-04T09:13:04.3952954Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_dynamo_distributed/distributed.test_dynamo_distributed-7d68e185dc40b8e4.xml 2025-12-04T09:13:04.8741231Z Uploading artifacts took 0.17 seconds 2025-12-04T09:13:04.8746431Z Running distributed/fsdp/test_fsdp_apply 1/1 ... [2025-12-04 09:13:04.874122][1210.976252892] 2025-12-04T09:13:04.8747017Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:13:04.8748278Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_apply.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:13:04.874487] 2025-12-04T09:14:51.0827825Z 2025-12-04T09:14:51.0829218Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_apply 1/1 (test/test-reports/distributed.fsdp.test_fsdp_apply_1.1_ffe46bf2b700541c_.log) 2025-12-04T09:14:51.0831404Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-550d70029afd2dcd.xml 2025-12-04T09:14:51.0832947Z ============================= test session starts ============================== 2025-12-04T09:14:51.0833637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.0834232Z cachedir: .pytest_cache 2025-12-04T09:14:51.0834955Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.0835745Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.0836414Z configfile: pytest.ini 2025-12-04T09:14:51.0837135Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.0837955Z collecting ... collected 3 items 2025-12-04T09:14:51.0838375Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:14:51.0840349Z Running 3 items in this shard: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda, test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda, test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T09:14:51.0841913Z 2025-12-04T09:14:51.0842865Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda I1204 09:13:08.320000 29141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29193 2025-12-04T09:14:51.0844444Z I1204 09:13:08.321000 29141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29194 2025-12-04T09:14:51.0846333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.0847831Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.0849819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.0851821Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.0853624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.0855132Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.0857078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.0859079Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.0859501Z File "", line 1, in 2025-12-04T09:14:51.0860115Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2025-12-04T09:14:51.0860749Z exitcode = _main(fd, parent_sentinel) 2025-12-04T09:14:51.0861353Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2025-12-04T09:14:51.0861981Z return self._bootstrap(parent_sentinel) 2025-12-04T09:14:51.0862619Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2025-12-04T09:14:51.0863244Z self.run() 2025-12-04T09:14:51.0863742Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/process.py", line 108, in run 2025-12-04T09:14:51.0864364Z self._target(*self._args, **self._kwargs) 2025-12-04T09:14:51.0865109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1272, in _run 2025-12-04T09:14:51.0865958Z self.run_test(test_name, pipe) 2025-12-04T09:14:51.0866714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.0867543Z getattr(self, test_name)() 2025-12-04T09:14:51.0868275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.0869025Z fn() 2025-12-04T09:14:51.0869653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0870380Z method(*args, **kwargs) 2025-12-04T09:14:51.0871073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0871809Z method(*args, **kwargs) 2025-12-04T09:14:51.0872484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0873219Z method(*args, **kwargs) 2025-12-04T09:14:51.0873988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test 2025-12-04T09:14:51.0874825Z result = test(self, **param_kwargs) 2025-12-04T09:14:51.0875584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 227, in wrapper 2025-12-04T09:14:51.0876351Z return func(*args, **kwargs) 2025-12-04T09:14:51.0877107Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_apply.py", line 113, in test_apply_in_summon_raises_error 2025-12-04T09:14:51.0877933Z transformer.apply(self._init_linear_weights) 2025-12-04T09:14:51.0879683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 586, in apply 2025-12-04T09:14:51.0880557Z self._assert_state(TrainingState.IDLE) 2025-12-04T09:14:51.0881459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1028, in _assert_state 2025-12-04T09:14:51.0882341Z traceback.print_stack() 2025-12-04T09:14:51.0884490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:3328: UserWarning: CUDA caching allocator reports a memory leak not verified by the driver API in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 347013120 and is now 347013120. 2025-12-04T09:14:51.0886512Z with policy(): 2025-12-04T09:14:51.0887141Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.0888277Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.0889942Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.0891700Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.0893296Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.0895056Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.0896562Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0898148Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.0899835Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0901424Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.0903026Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.0904565Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.0906205Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.0907858Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.0909932Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 453967872 and is now 456065024. 2025-12-04T09:14:51.0911862Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.0913053Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.0914668Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.0916109Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.0917433Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.0918763Z [rank0]:E1204 09:13:13.283000 29193 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:14:51.0919503Z dist init r=0, world=2 2025-12-04T09:14:51.0919865Z Asserting FSDP instance is: FullyShardedDataParallel( 2025-12-04T09:14:51.0920350Z (_fsdp_wrapped_module): TransformerWithSharedParams( 2025-12-04T09:14:51.0920762Z (embed_tokens): Embedding(23, 16) 2025-12-04T09:14:51.0921111Z (transformer): Transformer( 2025-12-04T09:14:51.0921445Z (encoder): TransformerEncoder( 2025-12-04T09:14:51.0921785Z (layers): ModuleList( 2025-12-04T09:14:51.0922107Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T09:14:51.0922530Z (_fsdp_wrapped_module): TransformerEncoderLayer( 2025-12-04T09:14:51.0922950Z (self_attn): MultiheadAttention( 2025-12-04T09:14:51.0923696Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T09:14:51.0924280Z ) 2025-12-04T09:14:51.0924638Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T09:14:51.0925108Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T09:14:51.0925582Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T09:14:51.0926121Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T09:14:51.0926658Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T09:14:51.0927184Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T09:14:51.0927594Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T09:14:51.0927954Z ) 2025-12-04T09:14:51.0928285Z ) 2025-12-04T09:14:51.0928499Z ) 2025-12-04T09:14:51.0928820Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T09:14:51.0929209Z ) 2025-12-04T09:14:51.0929445Z (decoder): TransformerDecoder( 2025-12-04T09:14:51.0929777Z (layers): ModuleList( 2025-12-04T09:14:51.0930095Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T09:14:51.0930507Z (_fsdp_wrapped_module): TransformerDecoderLayer( 2025-12-04T09:14:51.0930921Z (self_attn): MultiheadAttention( 2025-12-04T09:14:51.0931509Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T09:14:51.0932057Z ) 2025-12-04T09:14:51.0932341Z (multihead_attn): MultiheadAttention( 2025-12-04T09:14:51.0933106Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T09:14:51.0933737Z ) 2025-12-04T09:14:51.0934261Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T09:14:51.0934823Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T09:14:51.0935302Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T09:14:51.0935837Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T09:14:51.0936385Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T09:14:51.0936928Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T09:14:51.0937404Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T09:14:51.0937826Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T09:14:51.0938247Z (dropout3): Dropout(p=0.1, inplace=False) 2025-12-04T09:14:51.0938613Z ) 2025-12-04T09:14:51.0938839Z ) 2025-12-04T09:14:51.0939064Z ) 2025-12-04T09:14:51.0939491Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T09:14:51.0939912Z ) 2025-12-04T09:14:51.0940137Z ) 2025-12-04T09:14:51.0940494Z (output_proj): Linear(in_features=16, out_features=23, bias=True) 2025-12-04T09:14:51.0941127Z (bn): BatchNorm1d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 2025-12-04T09:14:51.0941643Z ) 2025-12-04T09:14:51.0941866Z ) 2025-12-04T09:14:51.0942392Z ERROR: expected to be in states [] but current state is TrainingState.SUMMON_FULL_PARAMS 2025-12-04T09:14:51.0944114Z [rank0]:[W1204 09:13:13.709572992 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:14:51.0945510Z FAILED [6.7172s] [ 33%] 2025-12-04T09:14:51.0945691Z 2025-12-04T09:14:51.0945853Z =================================== FAILURES =================================== 2025-12-04T09:14:51.0946512Z _____________ TestApplyCUDA.test_apply_in_summon_raises_error_cuda _____________ 2025-12-04T09:14:51.0947035Z Traceback (most recent call last): 2025-12-04T09:14:51.0947803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.0948580Z self._join_processes(fn) 2025-12-04T09:14:51.0949348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.0950193Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.0951057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.0951942Z raise RuntimeError(error) 2025-12-04T09:14:51.0952382Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.0952865Z Traceback (most recent call last): 2025-12-04T09:14:51.0953635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.0954394Z getattr(self, test_name)() 2025-12-04T09:14:51.0955124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.0955870Z fn() 2025-12-04T09:14:51.0956505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0957243Z method(*args, **kwargs) 2025-12-04T09:14:51.0957935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0958672Z method(*args, **kwargs) 2025-12-04T09:14:51.0959345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.0960078Z with policy(): 2025-12-04T09:14:51.0960735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.0961470Z raise RuntimeError(msg) 2025-12-04T09:14:51.0962741Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 453967872 and is now 456065024. 2025-12-04T09:14:51.0963949Z 2025-12-04T09:14:51.0964161Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.0965076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.0965769Z 2025-12-04T09:14:51.0966040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.0966430Z 2025-12-04T09:14:51.0966435Z 2025-12-04T09:14:51.0966710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.0967325Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.0968505Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-550d70029afd2dcd.xml - 2025-12-04T09:14:51.0969596Z =========================== short test summary info ============================ 2025-12-04T09:14:51.0970620Z FAILED [6.7172s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.0971667Z Traceback (most recent call last): 2025-12-04T09:14:51.0972413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.0973175Z getattr(self, test_name)() 2025-12-04T09:14:51.0974135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.0974916Z fn() 2025-12-04T09:14:51.0975583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0976332Z method(*args, **kwargs) 2025-12-04T09:14:51.0977051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.0977809Z method(*args, **kwargs) 2025-12-04T09:14:51.0978524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.0979543Z with policy(): 2025-12-04T09:14:51.0980232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.0980999Z raise RuntimeError(msg) 2025-12-04T09:14:51.0982320Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 453967872 and is now 456065024. 2025-12-04T09:14:51.0983554Z 2025-12-04T09:14:51.0983770Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.0984709Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.0985440Z 2025-12-04T09:14:51.0985707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.0986306Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.0986776Z ============================== 1 failed in 6.93s =============================== 2025-12-04T09:14:51.0987177Z Got exit code 1 2025-12-04T09:14:51.0987448Z Retrying single test... 2025-12-04T09:14:51.0988269Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-20b4ace7c31b01bc.xml 2025-12-04T09:14:51.0989213Z ============================= test session starts ============================== 2025-12-04T09:14:51.0989868Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.0990568Z cachedir: .pytest_cache 2025-12-04T09:14:51.0991250Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.0992009Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.0992353Z configfile: pytest.ini 2025-12-04T09:14:51.0993043Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.0993895Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T09:14:51.0995081Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.0995934Z Running 1 items in this shard 2025-12-04T09:14:51.0996133Z 2025-12-04T09:14:51.0997028Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda I1204 09:13:19.439000 29323 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29375 2025-12-04T09:14:51.0998510Z I1204 09:13:19.440000 29323 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29376 2025-12-04T09:14:51.1000288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1001913Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1003319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1004710Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1006565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1008528Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1010409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1012285Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1012672Z File "", line 1, in 2025-12-04T09:14:51.1013253Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2025-12-04T09:14:51.1014100Z exitcode = _main(fd, parent_sentinel) 2025-12-04T09:14:51.1014702Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2025-12-04T09:14:51.1015315Z return self._bootstrap(parent_sentinel) 2025-12-04T09:14:51.1015966Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2025-12-04T09:14:51.1016589Z self.run() 2025-12-04T09:14:51.1017076Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/process.py", line 108, in run 2025-12-04T09:14:51.1017697Z self._target(*self._args, **self._kwargs) 2025-12-04T09:14:51.1018464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1272, in _run 2025-12-04T09:14:51.1019224Z self.run_test(test_name, pipe) 2025-12-04T09:14:51.1019996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1020796Z getattr(self, test_name)() 2025-12-04T09:14:51.1021547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1022306Z fn() 2025-12-04T09:14:51.1022952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1023710Z method(*args, **kwargs) 2025-12-04T09:14:51.1024493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1025235Z method(*args, **kwargs) 2025-12-04T09:14:51.1026047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1026717Z method(*args, **kwargs) 2025-12-04T09:14:51.1027414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test 2025-12-04T09:14:51.1028178Z result = test(self, **param_kwargs) 2025-12-04T09:14:51.1028883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 227, in wrapper 2025-12-04T09:14:51.1029593Z return func(*args, **kwargs) 2025-12-04T09:14:51.1030273Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_apply.py", line 113, in test_apply_in_summon_raises_error 2025-12-04T09:14:51.1031035Z transformer.apply(self._init_linear_weights) 2025-12-04T09:14:51.1031818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 586, in apply 2025-12-04T09:14:51.1032578Z self._assert_state(TrainingState.IDLE) 2025-12-04T09:14:51.1033373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1028, in _assert_state 2025-12-04T09:14:51.1034159Z traceback.print_stack() 2025-12-04T09:14:51.1035929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:3328: UserWarning: CUDA caching allocator reports a memory leak not verified by the driver API in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 456065024. 2025-12-04T09:14:51.1037992Z with policy(): 2025-12-04T09:14:51.1038744Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1039843Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1041471Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1043070Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1044654Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1046140Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1047688Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1049187Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1050684Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1052177Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1053978Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1055531Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1057087Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1058684Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1060872Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 347013120. 2025-12-04T09:14:51.1062937Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1064104Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1066145Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1067504Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1070064Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1071331Z [rank1]:E1204 09:13:24.449000 29376 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1072041Z dist init r=1, world=2 2025-12-04T09:14:51.1072300Z FAILED [6.6918s] [100%] 2025-12-04T09:14:51.1072456Z 2025-12-04T09:14:51.1072591Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1073100Z _____________ TestApplyCUDA.test_apply_in_summon_raises_error_cuda _____________ 2025-12-04T09:14:51.1073585Z Traceback (most recent call last): 2025-12-04T09:14:51.1074280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1074997Z self._join_processes(fn) 2025-12-04T09:14:51.1075715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1076494Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1077270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1078044Z raise RuntimeError(error) 2025-12-04T09:14:51.1078455Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:14:51.1079221Z Traceback (most recent call last): 2025-12-04T09:14:51.1080018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1080826Z getattr(self, test_name)() 2025-12-04T09:14:51.1081581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1082348Z fn() 2025-12-04T09:14:51.1082995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1083888Z method(*args, **kwargs) 2025-12-04T09:14:51.1084590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1085349Z method(*args, **kwargs) 2025-12-04T09:14:51.1086057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1086805Z with policy(): 2025-12-04T09:14:51.1087481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1088246Z raise RuntimeError(msg) 2025-12-04T09:14:51.1089567Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 347013120. 2025-12-04T09:14:51.1090805Z 2025-12-04T09:14:51.1091173Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1092229Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1092912Z 2025-12-04T09:14:51.1093165Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1093626Z 2025-12-04T09:14:51.1093631Z 2025-12-04T09:14:51.1094018Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1094653Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1095864Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-20b4ace7c31b01bc.xml - 2025-12-04T09:14:51.1097092Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1098175Z FAILED [6.6918s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:14:51.1099211Z Traceback (most recent call last): 2025-12-04T09:14:51.1099991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1100792Z getattr(self, test_name)() 2025-12-04T09:14:51.1101547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1102321Z fn() 2025-12-04T09:14:51.1102952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1103715Z method(*args, **kwargs) 2025-12-04T09:14:51.1104427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1105170Z method(*args, **kwargs) 2025-12-04T09:14:51.1105882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1106716Z with policy(): 2025-12-04T09:14:51.1107375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1108111Z raise RuntimeError(msg) 2025-12-04T09:14:51.1109384Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 347013120. 2025-12-04T09:14:51.1110694Z 2025-12-04T09:14:51.1110899Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1111788Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1112458Z 2025-12-04T09:14:51.1112767Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1113325Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.1113800Z ======================= 1 failed, 2 deselected in 6.90s ======================== 2025-12-04T09:14:51.1114197Z Got exit code 1 2025-12-04T09:14:51.1114440Z Retrying single test... 2025-12-04T09:14:51.1115221Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-fca413b3f7307fd5.xml 2025-12-04T09:14:51.1116111Z ============================= test session starts ============================== 2025-12-04T09:14:51.1116718Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.1117287Z cachedir: .pytest_cache 2025-12-04T09:14:51.1117955Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.1118687Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.1119012Z configfile: pytest.ini 2025-12-04T09:14:51.1119693Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.1120521Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T09:14:51.1121471Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1122312Z Running 1 items in this shard 2025-12-04T09:14:51.1122523Z 2025-12-04T09:14:51.1123413Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda I1204 09:13:30.540000 29505 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29557 2025-12-04T09:14:51.1124994Z I1204 09:13:30.541000 29505 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29558 2025-12-04T09:14:51.1126767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1128343Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1130309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1132199Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1133895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1135397Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1137345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1139359Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1139786Z File "", line 1, in 2025-12-04T09:14:51.1140404Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2025-12-04T09:14:51.1141096Z exitcode = _main(fd, parent_sentinel) 2025-12-04T09:14:51.1141702Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2025-12-04T09:14:51.1142330Z return self._bootstrap(parent_sentinel) 2025-12-04T09:14:51.1142985Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2025-12-04T09:14:51.1143598Z self.run() 2025-12-04T09:14:51.1144108Z File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/process.py", line 108, in run 2025-12-04T09:14:51.1144732Z self._target(*self._args, **self._kwargs) 2025-12-04T09:14:51.1145485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1272, in _run 2025-12-04T09:14:51.1146328Z self.run_test(test_name, pipe) 2025-12-04T09:14:51.1147028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1147734Z getattr(self, test_name)() 2025-12-04T09:14:51.1148412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1149100Z fn() 2025-12-04T09:14:51.1149675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1150342Z method(*args, **kwargs) 2025-12-04T09:14:51.1150989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1151668Z method(*args, **kwargs) 2025-12-04T09:14:51.1152298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1153047Z method(*args, **kwargs) 2025-12-04T09:14:51.1153763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test 2025-12-04T09:14:51.1154538Z result = test(self, **param_kwargs) 2025-12-04T09:14:51.1155235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 227, in wrapper 2025-12-04T09:14:51.1155950Z return func(*args, **kwargs) 2025-12-04T09:14:51.1156648Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_apply.py", line 113, in test_apply_in_summon_raises_error 2025-12-04T09:14:51.1157408Z transformer.apply(self._init_linear_weights) 2025-12-04T09:14:51.1158183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 586, in apply 2025-12-04T09:14:51.1158952Z self._assert_state(TrainingState.IDLE) 2025-12-04T09:14:51.1159744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1028, in _assert_state 2025-12-04T09:14:51.1160523Z traceback.print_stack() 2025-12-04T09:14:51.1162295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:3328: UserWarning: CUDA caching allocator reports a memory leak not verified by the driver API in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 456065024. 2025-12-04T09:14:51.1164079Z with policy(): 2025-12-04T09:14:51.1164636Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1165644Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1167126Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1168643Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1170103Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1171464Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1172802Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1174510Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1176114Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1177704Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1179481Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1181023Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1182695Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1184306Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1186497Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 347013120. 2025-12-04T09:14:51.1188549Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1189713Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1191787Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1193232Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1194386Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1195708Z [rank1]:E1204 09:13:35.494000 29558 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1196448Z dist init r=1, world=2 2025-12-04T09:14:51.1196725Z FAILED [6.6933s] [100%] 2025-12-04T09:14:51.1196892Z 2025-12-04T09:14:51.1197048Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1197587Z _____________ TestApplyCUDA.test_apply_in_summon_raises_error_cuda _____________ 2025-12-04T09:14:51.1198076Z Traceback (most recent call last): 2025-12-04T09:14:51.1198907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1199662Z self._join_processes(fn) 2025-12-04T09:14:51.1200411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1201230Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1202067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1202990Z raise RuntimeError(error) 2025-12-04T09:14:51.1203385Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:14:51.1203834Z Traceback (most recent call last): 2025-12-04T09:14:51.1204536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1205240Z getattr(self, test_name)() 2025-12-04T09:14:51.1205911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1206592Z fn() 2025-12-04T09:14:51.1207166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1207826Z method(*args, **kwargs) 2025-12-04T09:14:51.1208465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1209146Z method(*args, **kwargs) 2025-12-04T09:14:51.1209766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1210488Z with policy(): 2025-12-04T09:14:51.1211094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1211775Z raise RuntimeError(msg) 2025-12-04T09:14:51.1212930Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 347013120. 2025-12-04T09:14:51.1214313Z 2025-12-04T09:14:51.1214531Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1215461Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1216169Z 2025-12-04T09:14:51.1216448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1216851Z 2025-12-04T09:14:51.1216856Z 2025-12-04T09:14:51.1217079Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1217710Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1218930Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-fca413b3f7307fd5.xml - 2025-12-04T09:14:51.1220067Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1221118Z FAILED [6.6933s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:14:51.1222114Z Traceback (most recent call last): 2025-12-04T09:14:51.1222907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1223712Z getattr(self, test_name)() 2025-12-04T09:14:51.1224455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1225224Z fn() 2025-12-04T09:14:51.1225933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1226724Z method(*args, **kwargs) 2025-12-04T09:14:51.1227349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1228022Z method(*args, **kwargs) 2025-12-04T09:14:51.1228655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1229311Z with policy(): 2025-12-04T09:14:51.1229925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1230616Z raise RuntimeError(msg) 2025-12-04T09:14:51.1231792Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 347013120. 2025-12-04T09:14:51.1232882Z 2025-12-04T09:14:51.1233077Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1233912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1234555Z 2025-12-04T09:14:51.1234795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1235324Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.1235764Z ======================= 1 failed, 2 deselected in 6.90s ======================== 2025-12-04T09:14:51.1236200Z Got exit code 1 2025-12-04T09:14:51.1236802Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda 2025-12-04T09:14:51.1237737Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:14:51.1238787Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-ee4bf9b90915483d.xml 2025-12-04T09:14:51.1239627Z ============================= test session starts ============================== 2025-12-04T09:14:51.1240219Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.1240746Z cachedir: .pytest_cache 2025-12-04T09:14:51.1241383Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.1242090Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.1242411Z configfile: pytest.ini 2025-12-04T09:14:51.1243044Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.1243832Z collecting ... collected 3 items / 1 deselected / 2 selected 2025-12-04T09:14:51.1244269Z stepcurrent: skipping 1 already run items. 2025-12-04T09:14:51.1244605Z Running 2 items in this shard 2025-12-04T09:14:51.1244805Z 2025-12-04T09:14:51.1245617Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda I1204 09:13:41.619000 29687 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29739 2025-12-04T09:14:51.1246987Z I1204 09:13:41.620000 29687 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29740 2025-12-04T09:14:51.1249066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1250920Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1252692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1254861Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1271490Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1272591Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1274207Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1275778Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1277389Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1278956Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1280634Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1282380Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1283980Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1285570Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1287167Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1288716Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1290271Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1292130Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1294354Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 453967872 and is now 458162176. 2025-12-04T09:14:51.1296380Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1297557Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1299423Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1300907Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1302138Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1303548Z [rank0]:E1204 09:13:46.933000 29739 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:14:51.1304682Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1305909Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1307525Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1308983Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1310435Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1311781Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1313180Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1314595Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1316013Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1317425Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1318833Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1320219Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1321603Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1323025Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1324920Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 344915968 and is now 349110272. 2025-12-04T09:14:51.1326727Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1327820Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1329405Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1330726Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1331807Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1333060Z [rank1]:E1204 09:13:46.933000 29740 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1334020Z dist init r=0, world=2 2025-12-04T09:14:51.1334321Z dist init r=1, world=2 2025-12-04T09:14:51.1335660Z [rank0]:[W1204 09:13:47.395221563 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:14:51.1337060Z FAILED [7.1862s] [ 50%] 2025-12-04T09:14:51.1337255Z 2025-12-04T09:14:51.1337408Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1337966Z _________________ TestApplyCUDA.test_nested_module_apply_cuda __________________ 2025-12-04T09:14:51.1338481Z Traceback (most recent call last): 2025-12-04T09:14:51.1339269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1340149Z self._join_processes(fn) 2025-12-04T09:14:51.1340956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1341813Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1342702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1343568Z raise RuntimeError(error) 2025-12-04T09:14:51.1344008Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1344505Z Traceback (most recent call last): 2025-12-04T09:14:51.1345291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1346282Z getattr(self, test_name)() 2025-12-04T09:14:51.1346941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1347632Z fn() 2025-12-04T09:14:51.1348213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1348880Z method(*args, **kwargs) 2025-12-04T09:14:51.1349518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1350193Z method(*args, **kwargs) 2025-12-04T09:14:51.1350828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1351480Z with policy(): 2025-12-04T09:14:51.1352083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1352768Z raise RuntimeError(msg) 2025-12-04T09:14:51.1353899Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 453967872 and is now 458162176. 2025-12-04T09:14:51.1354963Z 2025-12-04T09:14:51.1355215Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1356001Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1356598Z 2025-12-04T09:14:51.1356835Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1357195Z 2025-12-04T09:14:51.1357199Z 2025-12-04T09:14:51.1357412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1357960Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1359046Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-ee4bf9b90915483d.xml - 2025-12-04T09:14:51.1360052Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1360977Z FAILED [7.1862s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1361822Z Traceback (most recent call last): 2025-12-04T09:14:51.1362531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1363245Z getattr(self, test_name)() 2025-12-04T09:14:51.1363914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1364586Z fn() 2025-12-04T09:14:51.1365168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1365902Z method(*args, **kwargs) 2025-12-04T09:14:51.1366522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1367197Z method(*args, **kwargs) 2025-12-04T09:14:51.1367832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1368497Z with policy(): 2025-12-04T09:14:51.1369092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1369769Z raise RuntimeError(msg) 2025-12-04T09:14:51.1370897Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 453967872 and is now 458162176. 2025-12-04T09:14:51.1371952Z 2025-12-04T09:14:51.1372157Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1372940Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1373621Z 2025-12-04T09:14:51.1374052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1374651Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.1375158Z ======================= 1 failed, 1 deselected in 7.40s ======================== 2025-12-04T09:14:51.1375564Z Got exit code 1 2025-12-04T09:14:51.1375836Z Retrying single test... 2025-12-04T09:14:51.1376659Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5878d33d525e22d1.xml 2025-12-04T09:14:51.1377584Z ============================= test session starts ============================== 2025-12-04T09:14:51.1378236Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.1379049Z cachedir: .pytest_cache 2025-12-04T09:14:51.1379758Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.1380657Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.1381019Z configfile: pytest.ini 2025-12-04T09:14:51.1381752Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.1382624Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T09:14:51.1383595Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda 2025-12-04T09:14:51.1384461Z Running 1 items in this shard 2025-12-04T09:14:51.1384677Z 2025-12-04T09:14:51.1385597Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda I1204 09:13:53.109000 29874 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29926 2025-12-04T09:14:51.1387135Z I1204 09:13:53.110000 29874 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29927 2025-12-04T09:14:51.1389496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1391634Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1393413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1395268Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1395952Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1396952Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1398446Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1399914Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1401387Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1402739Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1404076Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1405497Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1406911Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1408329Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1409790Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1411175Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1412561Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1414269Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1416421Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1418421Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1419586Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1421367Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1422853Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1424149Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1425542Z [rank0]:E1204 09:13:58.438000 29926 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:14:51.1426813Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1427816Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1429311Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1430766Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1432236Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1433591Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1434929Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1436340Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1437745Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1439213Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1440627Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1441997Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1443378Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1444782Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1446698Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 344915968 and is now 349110272. 2025-12-04T09:14:51.1448484Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1449525Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1451107Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1452481Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1453648Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1455227Z [rank1]:E1204 09:13:58.438000 29927 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1456029Z dist init r=1, world=2 2025-12-04T09:14:51.1456307Z dist init r=0, world=2 2025-12-04T09:14:51.1457648Z [rank0]:[W1204 09:13:58.911414389 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:14:51.1459032Z FAILED [7.3434s] [100%] 2025-12-04T09:14:51.1459214Z 2025-12-04T09:14:51.1459377Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1459920Z _________________ TestApplyCUDA.test_nested_module_apply_cuda __________________ 2025-12-04T09:14:51.1460447Z Traceback (most recent call last): 2025-12-04T09:14:51.1461236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1462022Z self._join_processes(fn) 2025-12-04T09:14:51.1462820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1463690Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1464574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1465424Z raise RuntimeError(error) 2025-12-04T09:14:51.1465980Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1466535Z Traceback (most recent call last): 2025-12-04T09:14:51.1467294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1467988Z getattr(self, test_name)() 2025-12-04T09:14:51.1468657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1469339Z fn() 2025-12-04T09:14:51.1469902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1470578Z method(*args, **kwargs) 2025-12-04T09:14:51.1471209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1471882Z method(*args, **kwargs) 2025-12-04T09:14:51.1472506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1473172Z with policy(): 2025-12-04T09:14:51.1473779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1474447Z raise RuntimeError(msg) 2025-12-04T09:14:51.1475574Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1476640Z 2025-12-04T09:14:51.1476833Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1477626Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1478215Z 2025-12-04T09:14:51.1478464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1479204Z 2025-12-04T09:14:51.1479209Z 2025-12-04T09:14:51.1479436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1480070Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1481281Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5878d33d525e22d1.xml - 2025-12-04T09:14:51.1482404Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1483418Z FAILED [7.3434s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1484374Z Traceback (most recent call last): 2025-12-04T09:14:51.1485166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1485965Z getattr(self, test_name)() 2025-12-04T09:14:51.1486722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1487500Z fn() 2025-12-04T09:14:51.1488152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1488901Z method(*args, **kwargs) 2025-12-04T09:14:51.1489618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1490378Z method(*args, **kwargs) 2025-12-04T09:14:51.1491183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1492010Z with policy(): 2025-12-04T09:14:51.1492651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1493630Z raise RuntimeError(msg) 2025-12-04T09:14:51.1495168Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1496378Z 2025-12-04T09:14:51.1496595Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1497481Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1498144Z 2025-12-04T09:14:51.1498427Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1499020Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.1499513Z ======================= 1 failed, 2 deselected in 7.55s ======================== 2025-12-04T09:14:51.1499942Z Got exit code 1 2025-12-04T09:14:51.1500216Z Retrying single test... 2025-12-04T09:14:51.1501034Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-71ced988c25116a1.xml 2025-12-04T09:14:51.1501974Z ============================= test session starts ============================== 2025-12-04T09:14:51.1502632Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.1503226Z cachedir: .pytest_cache 2025-12-04T09:14:51.1503920Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.1504704Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.1505059Z configfile: pytest.ini 2025-12-04T09:14:51.1505874Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.1506861Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T09:14:51.1507716Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda 2025-12-04T09:14:51.1508491Z Running 1 items in this shard 2025-12-04T09:14:51.1508680Z 2025-12-04T09:14:51.1509489Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda I1204 09:14:04.609000 30061 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30113 2025-12-04T09:14:51.1510867Z I1204 09:14:04.610000 30061 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30114 2025-12-04T09:14:51.1512953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1514747Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1516531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1518301Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1518982Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1519993Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1521547Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1523013Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1524466Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1525835Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1527178Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1528590Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1530014Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1531408Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1532825Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1534509Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1536133Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1537728Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1539884Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1541908Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1543079Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1544878Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1546515Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1547610Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1548860Z [rank0]:E1204 09:14:09.936000 30113 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:14:51.1549880Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1550887Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1552468Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1553935Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1555407Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1556771Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1558125Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1559530Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1560949Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1562366Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1563789Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1565199Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1566579Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1567996Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1569897Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 344915968 and is now 349110272. 2025-12-04T09:14:51.1571686Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1572719Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1574618Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1576110Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1577333Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1578940Z [rank1]:E1204 09:14:09.939000 30114 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1579734Z dist init r=0, world=2 2025-12-04T09:14:51.1580021Z dist init r=1, world=2 2025-12-04T09:14:51.1581469Z [rank0]:[W1204 09:14:10.405170072 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:14:51.1582861Z FAILED [7.3023s] [100%] 2025-12-04T09:14:51.1583039Z 2025-12-04T09:14:51.1583188Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1583745Z _________________ TestApplyCUDA.test_nested_module_apply_cuda __________________ 2025-12-04T09:14:51.1584268Z Traceback (most recent call last): 2025-12-04T09:14:51.1585044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1585842Z self._join_processes(fn) 2025-12-04T09:14:51.1586645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1587516Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1588387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1589242Z raise RuntimeError(error) 2025-12-04T09:14:51.1589691Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1590173Z Traceback (most recent call last): 2025-12-04T09:14:51.1591031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1591743Z getattr(self, test_name)() 2025-12-04T09:14:51.1592415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1593154Z fn() 2025-12-04T09:14:51.1593727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1594394Z method(*args, **kwargs) 2025-12-04T09:14:51.1595036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1595703Z method(*args, **kwargs) 2025-12-04T09:14:51.1596334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1596999Z with policy(): 2025-12-04T09:14:51.1597592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1598271Z raise RuntimeError(msg) 2025-12-04T09:14:51.1599395Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1600451Z 2025-12-04T09:14:51.1600655Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1601430Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1602024Z 2025-12-04T09:14:51.1602260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1602627Z 2025-12-04T09:14:51.1602631Z 2025-12-04T09:14:51.1602998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1603586Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1604725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-71ced988c25116a1.xml - 2025-12-04T09:14:51.1605783Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1606806Z FAILED [7.3023s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1607714Z Traceback (most recent call last): 2025-12-04T09:14:51.1608447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1609202Z getattr(self, test_name)() 2025-12-04T09:14:51.1609917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1610651Z fn() 2025-12-04T09:14:51.1611245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1611963Z method(*args, **kwargs) 2025-12-04T09:14:51.1612638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1613405Z method(*args, **kwargs) 2025-12-04T09:14:51.1614266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1615015Z with policy(): 2025-12-04T09:14:51.1615700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1616458Z raise RuntimeError(msg) 2025-12-04T09:14:51.1617720Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1618917Z 2025-12-04T09:14:51.1619133Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1620014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T09:14:51.1620741Z 2025-12-04T09:14:51.1621010Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1621608Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.1622111Z ======================= 1 failed, 2 deselected in 7.51s ======================== 2025-12-04T09:14:51.1622533Z Got exit code 1 2025-12-04T09:14:51.1623143Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda 2025-12-04T09:14:51.1624146Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:14:51.1625434Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-00e807acf0912dba.xml 2025-12-04T09:14:51.1626520Z ============================= test session starts ============================== 2025-12-04T09:14:51.1627087Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.1627605Z cachedir: .pytest_cache 2025-12-04T09:14:51.1628234Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.1628911Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.1629219Z configfile: pytest.ini 2025-12-04T09:14:51.1629854Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.1630634Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T09:14:51.1631046Z stepcurrent: skipping 2 already run items. 2025-12-04T09:14:51.1631379Z Running 1 items in this shard 2025-12-04T09:14:51.1631561Z 2025-12-04T09:14:51.1632406Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda I1204 09:14:16.109000 30248 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30300 2025-12-04T09:14:51.1633847Z I1204 09:14:16.110000 30248 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30301 2025-12-04T09:14:51.1635497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1636819Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1638545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1640313Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1641672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1642985Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1644711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1646480Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1647208Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1648200Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1649687Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1651135Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1652583Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1654206Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1655702Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1657279Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1658858Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1660428Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1662011Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1663617Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1665158Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1666758Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1668689Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1670496Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1671516Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1673106Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1674439Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1675514Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1676809Z [rank0]:E1204 09:14:21.668000 30300 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:14:51.1677809Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1678942Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1681001Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1682633Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1684254Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1685784Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1687287Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1688867Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1690460Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1692028Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1694037Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1695580Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1697127Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1698719Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1700895Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 349110272. 2025-12-04T09:14:51.1702947Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1704102Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1706081Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1707407Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1708552Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1709796Z [rank1]:E1204 09:14:21.670000 30301 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1710488Z dist init r=0, world=2 2025-12-04T09:14:51.1710738Z dist init r=1, world=2 2025-12-04T09:14:51.1711913Z [rank0]:[W1204 09:14:22.129570149 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:14:51.1713127Z FAILED [7.0992s] [100%] 2025-12-04T09:14:51.1713282Z 2025-12-04T09:14:51.1713423Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1713915Z _______________ TestApplyCUDA.test_transformer_module_apply_cuda _______________ 2025-12-04T09:14:51.1714379Z Traceback (most recent call last): 2025-12-04T09:14:51.1715066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1715769Z self._join_processes(fn) 2025-12-04T09:14:51.1716466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1717224Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1718178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1718980Z raise RuntimeError(error) 2025-12-04T09:14:51.1719387Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1719841Z Traceback (most recent call last): 2025-12-04T09:14:51.1720565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1721298Z getattr(self, test_name)() 2025-12-04T09:14:51.1722055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1722771Z fn() 2025-12-04T09:14:51.1723367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1724052Z method(*args, **kwargs) 2025-12-04T09:14:51.1724705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1725403Z method(*args, **kwargs) 2025-12-04T09:14:51.1726049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1726910Z with policy(): 2025-12-04T09:14:51.1727561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1728293Z raise RuntimeError(msg) 2025-12-04T09:14:51.1729543Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1730811Z 2025-12-04T09:14:51.1731012Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1731853Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1732500Z 2025-12-04T09:14:51.1732754Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1733123Z 2025-12-04T09:14:51.1733127Z 2025-12-04T09:14:51.1733400Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1734225Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1735438Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-00e807acf0912dba.xml - 2025-12-04T09:14:51.1736549Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1737586Z FAILED [7.0992s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1738575Z Traceback (most recent call last): 2025-12-04T09:14:51.1739357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1740149Z getattr(self, test_name)() 2025-12-04T09:14:51.1740885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1741649Z fn() 2025-12-04T09:14:51.1742287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1743036Z method(*args, **kwargs) 2025-12-04T09:14:51.1743724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1744458Z method(*args, **kwargs) 2025-12-04T09:14:51.1745150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1745977Z with policy(): 2025-12-04T09:14:51.1746686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1747365Z raise RuntimeError(msg) 2025-12-04T09:14:51.1748514Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1749592Z 2025-12-04T09:14:51.1749833Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1750630Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1751245Z 2025-12-04T09:14:51.1751474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1751993Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.1752420Z ======================= 1 failed, 2 deselected in 7.31s ======================== 2025-12-04T09:14:51.1752783Z Got exit code 1 2025-12-04T09:14:51.1753014Z Retrying single test... 2025-12-04T09:14:51.1753734Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-b604ebea332b8d41.xml 2025-12-04T09:14:51.1754563Z ============================= test session starts ============================== 2025-12-04T09:14:51.1755133Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.1755644Z cachedir: .pytest_cache 2025-12-04T09:14:51.1756247Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.1756929Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.1757226Z configfile: pytest.ini 2025-12-04T09:14:51.1757848Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.1758600Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T09:14:51.1759463Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T09:14:51.1760310Z Running 1 items in this shard 2025-12-04T09:14:51.1760491Z 2025-12-04T09:14:51.1761330Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda I1204 09:14:27.690000 30435 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30487 2025-12-04T09:14:51.1762711Z I1204 09:14:27.691000 30435 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30488 2025-12-04T09:14:51.1764362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1765683Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1766987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1768296Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1770028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1771787Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1773613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1775760Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1776567Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1777692Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1779551Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1781201Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1782837Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1784365Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1785863Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1787441Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1789019Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1790797Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1792198Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1793561Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1794938Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1796351Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1798290Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1800091Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1801115Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1802714Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1804056Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1805146Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1806583Z [rank0]:E1204 09:14:33.137000 30487 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:14:51.1807589Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1808580Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1810064Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1811512Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1812961Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1814622Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1816119Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1817695Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1819267Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1820926Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1822499Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1824022Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1825562Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1827210Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1829153Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 344915968 and is now 349110272. 2025-12-04T09:14:51.1830960Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1831985Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1833578Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1834898Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1836043Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1837283Z [rank1]:E1204 09:14:33.139000 30488 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1837980Z dist init r=0, world=2 2025-12-04T09:14:51.1838219Z dist init r=1, world=2 2025-12-04T09:14:51.1839388Z [rank0]:[W1204 09:14:33.604758483 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:14:51.1840608Z FAILED [7.1430s] [100%] 2025-12-04T09:14:51.1840764Z 2025-12-04T09:14:51.1840901Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1841392Z _______________ TestApplyCUDA.test_transformer_module_apply_cuda _______________ 2025-12-04T09:14:51.1841854Z Traceback (most recent call last): 2025-12-04T09:14:51.1842537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1843226Z self._join_processes(fn) 2025-12-04T09:14:51.1843927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1844701Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1845476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1846222Z raise RuntimeError(error) 2025-12-04T09:14:51.1846678Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1847108Z Traceback (most recent call last): 2025-12-04T09:14:51.1847781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1848474Z getattr(self, test_name)() 2025-12-04T09:14:51.1849132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1849796Z fn() 2025-12-04T09:14:51.1850346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1851005Z method(*args, **kwargs) 2025-12-04T09:14:51.1851623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1852281Z method(*args, **kwargs) 2025-12-04T09:14:51.1852899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1853624Z with policy(): 2025-12-04T09:14:51.1854446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1855188Z raise RuntimeError(msg) 2025-12-04T09:14:51.1856484Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1857711Z 2025-12-04T09:14:51.1857920Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1858815Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1859498Z 2025-12-04T09:14:51.1859760Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1860160Z 2025-12-04T09:14:51.1860165Z 2025-12-04T09:14:51.1860378Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1861046Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1862247Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-b604ebea332b8d41.xml - 2025-12-04T09:14:51.1863357Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1865949Z FAILED [7.1430s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:14:51.1866951Z Traceback (most recent call last): 2025-12-04T09:14:51.1867654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1868359Z getattr(self, test_name)() 2025-12-04T09:14:51.1869031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1869710Z fn() 2025-12-04T09:14:51.1870278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1870961Z method(*args, **kwargs) 2025-12-04T09:14:51.1871581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1872244Z method(*args, **kwargs) 2025-12-04T09:14:51.1872857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1873512Z with policy(): 2025-12-04T09:14:51.1874104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1874857Z raise RuntimeError(msg) 2025-12-04T09:14:51.1876006Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 456065024 and is now 458162176. 2025-12-04T09:14:51.1877101Z 2025-12-04T09:14:51.1877292Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1878102Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1878885Z 2025-12-04T09:14:51.1879318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1879900Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.1880401Z ======================= 1 failed, 2 deselected in 7.35s ======================== 2025-12-04T09:14:51.1880827Z Got exit code 1 2025-12-04T09:14:51.1881091Z Retrying single test... 2025-12-04T09:14:51.1881898Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-320024c0c6bb40b5.xml 2025-12-04T09:14:51.1882834Z ============================= test session starts ============================== 2025-12-04T09:14:51.1883485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.1884060Z cachedir: .pytest_cache 2025-12-04T09:14:51.1884757Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.1885522Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.1885873Z configfile: pytest.ini 2025-12-04T09:14:51.1886581Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.1887460Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T09:14:51.1888588Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T09:14:51.1889470Z Running 1 items in this shard 2025-12-04T09:14:51.1889678Z 2025-12-04T09:14:51.1890623Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda I1204 09:14:39.189000 30622 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30674 2025-12-04T09:14:51.1892297Z I1204 09:14:39.190000 30622 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30675 2025-12-04T09:14:51.1894250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1895761Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1897722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1899721Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1901244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:14:51.1902734Z self.encoder = TransformerEncoder( 2025-12-04T09:14:51.1904785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:14:51.1906902Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:14:51.1907564Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1908571Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1910051Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1911503Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1912950Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1914306Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1915643Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1917039Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1918451Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1919900Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1921300Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1922664Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1924036Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1925453Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1927381Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 347013120 and is now 349110272. 2025-12-04T09:14:51.1929197Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1930226Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1931825Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1933206Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1934588Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1936002Z [rank1]:E1204 09:14:44.780000 30675 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:14:51.1937136Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:14:51.1938255Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:14:51.1939913Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1941559Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:14:51.1943193Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1944712Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:14:51.1946247Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1947648Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1949113Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1950513Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:14:51.1951924Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1953288Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:14:51.1954652Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1956072Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:14:51.1958003Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 453967872 and is now 458162176. 2025-12-04T09:14:51.1959810Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1960838Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1962486Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1963820Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:14:51.1964905Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1966144Z [rank0]:E1204 09:14:44.793000 30674 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:14:51.1966837Z dist init r=1, world=2 2025-12-04T09:14:51.1967089Z dist init r=0, world=2 2025-12-04T09:14:51.1968273Z [rank0]:[W1204 09:14:45.271218490 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:14:51.1969499Z FAILED [7.1951s] [100%] 2025-12-04T09:14:51.1969655Z 2025-12-04T09:14:51.1969785Z =================================== FAILURES =================================== 2025-12-04T09:14:51.1970280Z _______________ TestApplyCUDA.test_transformer_module_apply_cuda _______________ 2025-12-04T09:14:51.1970753Z Traceback (most recent call last): 2025-12-04T09:14:51.1971441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:14:51.1972130Z self._join_processes(fn) 2025-12-04T09:14:51.1972839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:14:51.1973678Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:14:51.1974713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:14:51.1975563Z raise RuntimeError(error) 2025-12-04T09:14:51.1976085Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:14:51.1976568Z Traceback (most recent call last): 2025-12-04T09:14:51.1977331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1978113Z getattr(self, test_name)() 2025-12-04T09:14:51.1979050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1979803Z fn() 2025-12-04T09:14:51.1980439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1981183Z method(*args, **kwargs) 2025-12-04T09:14:51.1981887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1982623Z method(*args, **kwargs) 2025-12-04T09:14:51.1983328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.1984073Z with policy(): 2025-12-04T09:14:51.1984728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.1985485Z raise RuntimeError(msg) 2025-12-04T09:14:51.1986782Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 347013120 and is now 349110272. 2025-12-04T09:14:51.1988007Z 2025-12-04T09:14:51.1988234Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.1989282Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.1989983Z 2025-12-04T09:14:51.1990255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.1990859Z 2025-12-04T09:14:51.1990863Z 2025-12-04T09:14:51.1991060Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:14:51.1991614Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:14:51.1992686Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-320024c0c6bb40b5.xml - 2025-12-04T09:14:51.1993842Z =========================== short test summary info ============================ 2025-12-04T09:14:51.1994823Z FAILED [7.1951s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:14:51.1995750Z Traceback (most recent call last): 2025-12-04T09:14:51.1996493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:14:51.1997232Z getattr(self, test_name)() 2025-12-04T09:14:51.1997939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:14:51.1998652Z fn() 2025-12-04T09:14:51.1999249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.1999954Z method(*args, **kwargs) 2025-12-04T09:14:51.2000613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:14:51.2001313Z method(*args, **kwargs) 2025-12-04T09:14:51.2001965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:14:51.2002663Z with policy(): 2025-12-04T09:14:51.2003366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:14:51.2004073Z raise RuntimeError(msg) 2025-12-04T09:14:51.2005294Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 347013120 and is now 349110272. 2025-12-04T09:14:51.2006445Z 2025-12-04T09:14:51.2006644Z To execute this test, run the following from the base repo dir: 2025-12-04T09:14:51.2007495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T09:14:51.2008143Z 2025-12-04T09:14:51.2008403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:14:51.2008948Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:14:51.2009501Z ======================= 1 failed, 2 deselected in 7.40s ======================== 2025-12-04T09:14:51.2009871Z Got exit code 1 2025-12-04T09:14:51.2010422Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T09:14:51.2011319Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:14:51.2012358Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-f868d67f33e24985.xml 2025-12-04T09:14:51.2013190Z ============================= test session starts ============================== 2025-12-04T09:14:51.2014016Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:14:51.2014670Z cachedir: .pytest_cache 2025-12-04T09:14:51.2015372Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:14:51.2016132Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:14:51.2016482Z configfile: pytest.ini 2025-12-04T09:14:51.2017196Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:14:51.2018063Z collecting ... collected 3 items / 3 deselected / 0 selected 2025-12-04T09:14:51.2018527Z stepcurrent: skipping 3 already run items. 2025-12-04T09:14:51.2018903Z Running 0 items in this shard 2025-12-04T09:14:51.2019106Z 2025-12-04T09:14:51.2019924Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-f868d67f33e24985.xml - 2025-12-04T09:14:51.2021026Z ============================ 3 deselected in 0.01s ============================= 2025-12-04T09:14:51.2022824Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda', 'test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda', 'test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda'] 2025-12-04T09:14:51.2024426Z 2025-12-04T09:14:51.2025046Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_apply 1/1 (test/test-reports/distributed.fsdp.test_fsdp_apply_1.1_ffe46bf2b700541c_.log) 2025-12-04T09:14:51.2025810Z 2025-12-04T09:14:51.2026275Z Finished distributed/fsdp/test_fsdp_apply 1/1 ... [2025-12-04 09:14:51.083880][1317.186008688], took 1.77min 2025-12-04T09:14:51.2027502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-550d70029afd2dcd.xml 2025-12-04T09:14:51.2246540Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-20b4ace7c31b01bc.xml 2025-12-04T09:14:51.2692909Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-fca413b3f7307fd5.xml 2025-12-04T09:14:51.3203412Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-ee4bf9b90915483d.xml 2025-12-04T09:14:51.3572118Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5878d33d525e22d1.xml 2025-12-04T09:14:51.4120441Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-71ced988c25116a1.xml 2025-12-04T09:14:51.4452629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-00e807acf0912dba.xml 2025-12-04T09:14:51.4722799Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-b604ebea332b8d41.xml 2025-12-04T09:14:51.5040118Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-320024c0c6bb40b5.xml 2025-12-04T09:14:51.5323229Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-f868d67f33e24985.xml 2025-12-04T09:14:51.7215472Z Uploading logs for 57116084892 to S3 2025-12-04T09:14:51.7933061Z Uploading artifacts took 0.24 seconds 2025-12-04T09:14:51.7934177Z distributed/fsdp/test_fsdp_apply 1/1 failed! 2025-12-04T09:14:51.7936437Z Running distributed/fsdp/test_fsdp_multiple_wrapping 1/1 ... [2025-12-04 09:14:51.793498][1317.895629307] 2025-12-04T09:14:51.7937101Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:14:51.7940114Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_multiple_wrapping.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:14:51.793818] 2025-12-04T09:15:35.3556769Z 2025-12-04T09:15:35.3557815Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_multiple_wrapping 1/1 (test/test-reports/distributed.fsdp.test_fsdp_multiple_wrapping_1.1_4a76a0d00df8da58_.log) 2025-12-04T09:15:35.3559446Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-7c59f12ab3dc26b8.xml 2025-12-04T09:15:35.3560505Z ============================= test session starts ============================== 2025-12-04T09:15:35.3561203Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:15:35.3561778Z cachedir: .pytest_cache 2025-12-04T09:15:35.3562491Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:15:35.3563258Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:15:35.3563614Z configfile: pytest.ini 2025-12-04T09:15:35.3564307Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:15:35.3565077Z collecting ... collected 1 item 2025-12-04T09:15:35.3565480Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:15:35.3566389Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda 2025-12-04T09:15:35.3567121Z 2025-12-04T09:15:35.3568133Z distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda I1204 09:14:55.200000 30865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30917 2025-12-04T09:15:35.3570057Z I1204 09:14:55.201000 30865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30918 2025-12-04T09:15:35.3571191Z I1204 09:14:55.201000 30865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 30919 2025-12-04T09:15:35.3572287Z I1204 09:14:55.202000 30865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 30920 2025-12-04T09:15:35.3574884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3576916Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3579137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3581146Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3583150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3585445Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3587438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3589431Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3590198Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3591430Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3593058Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3594657Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3596249Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3597734Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3599201Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3600757Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3602369Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3603918Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3605465Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3606966Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3608471Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3610035Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3612194Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:15:35.3614490Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3615668Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3617661Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3631675Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3633031Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3634507Z [rank2]:E1204 09:15:02.071000 30919 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:15:35.3635596Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3636682Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3638279Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3639832Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3641374Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3642818Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3644242Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3645877Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3647371Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3648870Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3650379Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3651846Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3653413Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3655168Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3657402Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T09:15:35.3659500Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3660757Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3662684Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3664284Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3665518Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3666927Z [rank0]:E1204 09:15:02.072000 30917 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:15:35.3667948Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3668945Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3670434Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3671893Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3673357Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3674717Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3676105Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3677520Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3679271Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3681042Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3682635Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3684186Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3685751Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3687524Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3689752Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 3. CUDA driver allocated memory was 487522304 and is now 628031488. 2025-12-04T09:15:35.3692078Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3693208Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3695361Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3696972Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3698203Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3699624Z [rank3]:E1204 09:15:02.072000 30920 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:15:35.3700768Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3701900Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3703581Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3705336Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3707000Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3708527Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3709976Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3711577Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3713085Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3714568Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3716070Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3717526Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3718991Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3720686Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3722845Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 1. CUDA driver allocated memory was 602865664 and is now 628031488. 2025-12-04T09:15:35.3724920Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3726063Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3727917Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3729473Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3730658Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3732118Z [rank1]:E1204 09:15:02.072000 30918 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:15:35.3732867Z dist init r=2, world=4 2025-12-04T09:15:35.3733141Z dist init r=0, world=4 2025-12-04T09:15:35.3733457Z dist init r=1, world=4 2025-12-04T09:15:35.3733724Z dist init r=3, world=4 2025-12-04T09:15:35.3734164Z FAILED [8.4936s] [100%] 2025-12-04T09:15:35.3734342Z 2025-12-04T09:15:35.3734493Z =================================== FAILURES =================================== 2025-12-04T09:15:35.3735095Z _____________ TestMultipleWrappingCUDA.test_multiple_wrapping_cuda _____________ 2025-12-04T09:15:35.3735661Z Traceback (most recent call last): 2025-12-04T09:15:35.3736455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:15:35.3737252Z self._join_processes(fn) 2025-12-04T09:15:35.3738120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:15:35.3738993Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:15:35.3739868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:15:35.3740735Z raise RuntimeError(error) 2025-12-04T09:15:35.3741194Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:15:35.3741694Z Traceback (most recent call last): 2025-12-04T09:15:35.3742468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3743269Z getattr(self, test_name)() 2025-12-04T09:15:35.3744027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3744791Z fn() 2025-12-04T09:15:35.3745444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3746389Z method(*args, **kwargs) 2025-12-04T09:15:35.3747027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3747685Z method(*args, **kwargs) 2025-12-04T09:15:35.3748317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3748990Z with policy(): 2025-12-04T09:15:35.3749587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3750271Z raise RuntimeError(msg) 2025-12-04T09:15:35.3751525Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:15:35.3752648Z 2025-12-04T09:15:35.3752855Z To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3753757Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3754458Z 2025-12-04T09:15:35.3754696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3755065Z 2025-12-04T09:15:35.3755069Z 2025-12-04T09:15:35.3755272Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:15:35.3755831Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:15:35.3757028Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-7c59f12ab3dc26b8.xml - 2025-12-04T09:15:35.3758327Z =========================== short test summary info ============================ 2025-12-04T09:15:35.3759431Z FAILED [8.4936s] distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:15:35.3760459Z Traceback (most recent call last): 2025-12-04T09:15:35.3761206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3761948Z getattr(self, test_name)() 2025-12-04T09:15:35.3762659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3763387Z fn() 2025-12-04T09:15:35.3763992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3764709Z method(*args, **kwargs) 2025-12-04T09:15:35.3765443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3766157Z method(*args, **kwargs) 2025-12-04T09:15:35.3766819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3767521Z with policy(): 2025-12-04T09:15:35.3768162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3768886Z raise RuntimeError(msg) 2025-12-04T09:15:35.3770225Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:15:35.3771359Z 2025-12-04T09:15:35.3771554Z To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3772457Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3773160Z 2025-12-04T09:15:35.3773489Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3774230Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:15:35.3774743Z ============================== 1 failed in 8.52s =============================== 2025-12-04T09:15:35.3775150Z Got exit code 1 2025-12-04T09:15:35.3775411Z Retrying single test... 2025-12-04T09:15:35.3776379Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-fd5feab7f24ea67e.xml 2025-12-04T09:15:35.3777519Z ============================= test session starts ============================== 2025-12-04T09:15:35.3778176Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:15:35.3778955Z cachedir: .pytest_cache 2025-12-04T09:15:35.3779674Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:15:35.3780464Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:15:35.3780822Z configfile: pytest.ini 2025-12-04T09:15:35.3781539Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:15:35.3782333Z collecting ... collected 1 item 2025-12-04T09:15:35.3783304Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda 2025-12-04T09:15:35.3784286Z Running 1 items in this shard 2025-12-04T09:15:35.3784510Z 2025-12-04T09:15:35.3785560Z distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda I1204 09:15:08.409000 31194 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 31246 2025-12-04T09:15:35.3787246Z I1204 09:15:08.410000 31194 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 31247 2025-12-04T09:15:35.3788378Z I1204 09:15:08.411000 31194 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 31248 2025-12-04T09:15:35.3789507Z I1204 09:15:08.412000 31194 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 31249 2025-12-04T09:15:35.3791916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3793992Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3796155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3798056Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3799964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3801742Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3803508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3805277Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3805957Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3806964Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3808520Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3809976Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3811435Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3812794Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3814446Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3816039Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3817637Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3819227Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3820823Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3822373Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3824036Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3825634Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3827763Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T09:15:35.3829608Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3830654Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3832344Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3833777Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3834865Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3836120Z [rank0]:E1204 09:15:15.248000 31246 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:15:35.3837176Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3838185Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3839676Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3841135Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3842592Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3843935Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3845277Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3846691Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3848104Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3849510Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3850909Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3852333Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3853960Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3855561Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3857784Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:15:35.3859860Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3861041Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3862951Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3864575Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3865912Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3867324Z [rank1]:E1204 09:15:15.248000 31247 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:15:35.3868344Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3869349Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3870829Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3872276Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3873730Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3875086Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3876426Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3877836Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3879681Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3881283Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3882996Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3884551Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3886111Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3887700Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3889926Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 2. CUDA driver allocated memory was 602865664 and is now 628031488. 2025-12-04T09:15:35.3892049Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3893090Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3895154Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3896752Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3898066Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3899477Z [rank2]:E1204 09:15:15.248000 31248 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:15:35.3900624Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.3901744Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.3903443Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3905095Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.3906856Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3908212Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.3909549Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3910964Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3912364Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3913784Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.3915259Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3916640Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.3918024Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3919431Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.3921416Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 3. CUDA driver allocated memory was 487522304 and is now 628031488. 2025-12-04T09:15:35.3923257Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3924304Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3926002Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3927484Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.3928567Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3929814Z [rank3]:E1204 09:15:15.248000 31249 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:15:35.3930520Z dist init r=2, world=4 2025-12-04T09:15:35.3930768Z dist init r=3, world=4 2025-12-04T09:15:35.3931019Z dist init r=0, world=4 2025-12-04T09:15:35.3931267Z dist init r=1, world=4 2025-12-04T09:15:35.3931501Z FAILED [8.4995s] [100%] 2025-12-04T09:15:35.3931671Z 2025-12-04T09:15:35.3931805Z =================================== FAILURES =================================== 2025-12-04T09:15:35.3932336Z _____________ TestMultipleWrappingCUDA.test_multiple_wrapping_cuda _____________ 2025-12-04T09:15:35.3932839Z Traceback (most recent call last): 2025-12-04T09:15:35.3933592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:15:35.3934553Z self._join_processes(fn) 2025-12-04T09:15:35.3935362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:15:35.3936242Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:15:35.3937118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:15:35.3937986Z raise RuntimeError(error) 2025-12-04T09:15:35.3938442Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:15:35.3938931Z Traceback (most recent call last): 2025-12-04T09:15:35.3939715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3940518Z getattr(self, test_name)() 2025-12-04T09:15:35.3941383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3942142Z fn() 2025-12-04T09:15:35.3942789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3943549Z method(*args, **kwargs) 2025-12-04T09:15:35.3944250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3945011Z method(*args, **kwargs) 2025-12-04T09:15:35.3945727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3946537Z with policy(): 2025-12-04T09:15:35.3947131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3947822Z raise RuntimeError(msg) 2025-12-04T09:15:35.3949018Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 3. CUDA driver allocated memory was 487522304 and is now 628031488. 2025-12-04T09:15:35.3950141Z 2025-12-04T09:15:35.3950346Z To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3951235Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3951952Z 2025-12-04T09:15:35.3952193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3952560Z 2025-12-04T09:15:35.3952564Z 2025-12-04T09:15:35.3952762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:15:35.3953387Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:15:35.3954575Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-fd5feab7f24ea67e.xml - 2025-12-04T09:15:35.3955689Z =========================== short test summary info ============================ 2025-12-04T09:15:35.3956728Z FAILED [8.4995s] distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:15:35.3957703Z Traceback (most recent call last): 2025-12-04T09:15:35.3958397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.3959111Z getattr(self, test_name)() 2025-12-04T09:15:35.3959787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.3960483Z fn() 2025-12-04T09:15:35.3961047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3961717Z method(*args, **kwargs) 2025-12-04T09:15:35.3962351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.3963014Z method(*args, **kwargs) 2025-12-04T09:15:35.3963650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.3964315Z with policy(): 2025-12-04T09:15:35.3964922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.3965591Z raise RuntimeError(msg) 2025-12-04T09:15:35.3966779Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 3. CUDA driver allocated memory was 487522304 and is now 628031488. 2025-12-04T09:15:35.3967912Z 2025-12-04T09:15:35.3968161Z To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.3969066Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.3969768Z 2025-12-04T09:15:35.3970005Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.3970535Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:15:35.3970964Z ============================== 1 failed in 8.52s =============================== 2025-12-04T09:15:35.3971321Z Got exit code 1 2025-12-04T09:15:35.3971556Z Retrying single test... 2025-12-04T09:15:35.3972409Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-42ba77c7c8182ea3.xml 2025-12-04T09:15:35.3973437Z ============================= test session starts ============================== 2025-12-04T09:15:35.3974218Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:15:35.3974819Z cachedir: .pytest_cache 2025-12-04T09:15:35.3975532Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:15:35.3976313Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:15:35.3976652Z configfile: pytest.ini 2025-12-04T09:15:35.3977376Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:15:35.3978167Z collecting ... collected 1 item 2025-12-04T09:15:35.3979300Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda 2025-12-04T09:15:35.3980411Z Running 1 items in this shard 2025-12-04T09:15:35.3980636Z 2025-12-04T09:15:35.3981692Z distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda I1204 09:15:21.599000 31523 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 31575 2025-12-04T09:15:35.3983378Z I1204 09:15:21.600000 31523 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 31576 2025-12-04T09:15:35.3984517Z I1204 09:15:21.601000 31523 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 31577 2025-12-04T09:15:35.3985641Z I1204 09:15:21.602000 31523 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 31578 2025-12-04T09:15:35.3987998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3990014Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3991972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3993755Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3995588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.3997374Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.3999147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:15:35.4000927Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:15:35.4001609Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.4002612Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.4004104Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.4005569Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.4007029Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.4008386Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.4009787Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4011198Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4012614Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4014305Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4015903Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.4017450Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.4019021Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.4020616Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.4022845Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 1. CUDA driver allocated memory was 598671360 and is now 628031488. 2025-12-04T09:15:35.4024941Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4026256Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.4027953Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.4029381Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4030475Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.4031722Z [rank1]:E1204 09:15:28.425000 31576 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:15:35.4032731Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.4033740Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.4035232Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.4036696Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.4038141Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.4039565Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.4040911Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4042327Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4043741Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4045137Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4046566Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.4047943Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.4049330Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.4050759Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.4052723Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 3. CUDA driver allocated memory was 487522304 and is now 628031488. 2025-12-04T09:15:35.4055073Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4056251Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.4058163Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.4059780Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4061004Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.4062427Z [rank3]:E1204 09:15:28.425000 31578 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:15:35.4063575Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.4064710Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.4066435Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.4067900Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.4069412Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.4070785Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.4072120Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4073515Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4074934Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4076352Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4077773Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.4079483Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.4081029Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.4082628Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.4086358Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T09:15:35.4088475Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4089645Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.4091632Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.4093068Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4094469Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.4095878Z [rank0]:E1204 09:15:28.426000 31575 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:15:35.4097012Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:15:35.4098143Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:15:35.4099823Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.4101580Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:15:35.4103232Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.4104744Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:15:35.4106425Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4107839Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4109258Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4110673Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:15:35.4112074Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.4113450Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:15:35.4114830Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.4116251Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:15:35.4118283Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 2. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:15:35.4120116Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4121158Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.4122851Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.4124295Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:15:35.4125390Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.4126635Z [rank2]:E1204 09:15:28.426000 31577 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:15:35.4127347Z dist init r=2, world=4 2025-12-04T09:15:35.4127606Z dist init r=0, world=4 2025-12-04T09:15:35.4127849Z dist init r=1, world=4 2025-12-04T09:15:35.4128100Z dist init r=3, world=4 2025-12-04T09:15:35.4128350Z FAILED [8.4843s] [100%] 2025-12-04T09:15:35.4128506Z 2025-12-04T09:15:35.4128654Z =================================== FAILURES =================================== 2025-12-04T09:15:35.4129244Z _____________ TestMultipleWrappingCUDA.test_multiple_wrapping_cuda _____________ 2025-12-04T09:15:35.4129745Z Traceback (most recent call last): 2025-12-04T09:15:35.4130454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:15:35.4131151Z self._join_processes(fn) 2025-12-04T09:15:35.4131869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:15:35.4132648Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:15:35.4133507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:15:35.4134507Z raise RuntimeError(error) 2025-12-04T09:15:35.4134966Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:15:35.4135468Z Traceback (most recent call last): 2025-12-04T09:15:35.4136238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.4137031Z getattr(self, test_name)() 2025-12-04T09:15:35.4137784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.4138553Z fn() 2025-12-04T09:15:35.4139187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4139940Z method(*args, **kwargs) 2025-12-04T09:15:35.4140657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4141399Z method(*args, **kwargs) 2025-12-04T09:15:35.4142109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.4142865Z with policy(): 2025-12-04T09:15:35.4143544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.4144296Z raise RuntimeError(msg) 2025-12-04T09:15:35.4145808Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 1. CUDA driver allocated memory was 598671360 and is now 628031488. 2025-12-04T09:15:35.4146941Z 2025-12-04T09:15:35.4147135Z To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.4148033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.4148734Z 2025-12-04T09:15:35.4148974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.4149350Z 2025-12-04T09:15:35.4149353Z 2025-12-04T09:15:35.4149552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:15:35.4150112Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:15:35.4151308Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-42ba77c7c8182ea3.xml - 2025-12-04T09:15:35.4152403Z =========================== short test summary info ============================ 2025-12-04T09:15:35.4153441Z FAILED [8.4843s] distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:15:35.4154415Z Traceback (most recent call last): 2025-12-04T09:15:35.4155123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:15:35.4155883Z getattr(self, test_name)() 2025-12-04T09:15:35.4156560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:15:35.4157251Z fn() 2025-12-04T09:15:35.4157834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4158497Z method(*args, **kwargs) 2025-12-04T09:15:35.4159133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:15:35.4159810Z method(*args, **kwargs) 2025-12-04T09:15:35.4160436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:15:35.4161111Z with policy(): 2025-12-04T09:15:35.4161724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:15:35.4162412Z raise RuntimeError(msg) 2025-12-04T09:15:35.4163597Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestMultipleWrappingCUDA.test_multiple_wrapping_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 1. CUDA driver allocated memory was 598671360 and is now 628031488. 2025-12-04T09:15:35.4164729Z 2025-12-04T09:15:35.4164923Z To execute this test, run the following from the base repo dir: 2025-12-04T09:15:35.4165829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_multiple_wrapping.py TestMultipleWrappingCUDA.test_multiple_wrapping_cuda 2025-12-04T09:15:35.4166535Z 2025-12-04T09:15:35.4166790Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:15:35.4167306Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:15:35.4167737Z ============================== 1 failed in 8.51s =============================== 2025-12-04T09:15:35.4168107Z Got exit code 1 2025-12-04T09:15:35.4168781Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda 2025-12-04T09:15:35.4169818Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:15:35.4170984Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-c4d1f6f933180ae5.xml 2025-12-04T09:15:35.4171936Z ============================= test session starts ============================== 2025-12-04T09:15:35.4172516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:15:35.4173037Z cachedir: .pytest_cache 2025-12-04T09:15:35.4173905Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:15:35.4174746Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:15:35.4175093Z configfile: pytest.ini 2025-12-04T09:15:35.4175822Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:15:35.4176701Z collecting ... collected 1 item / 1 deselected / 0 selected 2025-12-04T09:15:35.4177183Z stepcurrent: skipping 1 already run items. 2025-12-04T09:15:35.4177555Z Running 0 items in this shard 2025-12-04T09:15:35.4177778Z 2025-12-04T09:15:35.4178922Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-c4d1f6f933180ae5.xml - 2025-12-04T09:15:35.4180178Z ============================ 1 deselected in 0.01s ============================= 2025-12-04T09:15:35.4181182Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_multiple_wrapping.py::TestMultipleWrappingCUDA::test_multiple_wrapping_cuda'] 2025-12-04T09:15:35.4182121Z 2025-12-04T09:15:35.4182863Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_multiple_wrapping 1/1 (test/test-reports/distributed.fsdp.test_fsdp_multiple_wrapping_1.1_4a76a0d00df8da58_.log) 2025-12-04T09:15:35.4183746Z 2025-12-04T09:15:35.4184202Z Finished distributed/fsdp/test_fsdp_multiple_wrapping 1/1 ... [2025-12-04 09:15:35.355664][1361.457793715], took 0.73min 2025-12-04T09:15:35.4185790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-7c59f12ab3dc26b8.xml 2025-12-04T09:15:35.4473722Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-fd5feab7f24ea67e.xml 2025-12-04T09:15:35.4916356Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-42ba77c7c8182ea3.xml 2025-12-04T09:15:35.5351334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-c4d1f6f933180ae5.xml 2025-12-04T09:15:35.7895242Z Uploading logs for 57116084892 to S3 2025-12-04T09:15:35.8266376Z Uploading artifacts took 0.25 seconds 2025-12-04T09:15:35.8266977Z distributed/fsdp/test_fsdp_multiple_wrapping 1/1 failed! 2025-12-04T09:15:35.8269994Z Running distributed/fsdp/test_fsdp_fine_tune 1/1 ... [2025-12-04 09:15:35.826725][1361.928856313] 2025-12-04T09:15:35.8270602Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:15:35.8272072Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_fine_tune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:15:35.827031] 2025-12-04T09:18:20.5626970Z 2025-12-04T09:18:20.5628003Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_fine_tune 1/1 (test/test-reports/distributed.fsdp.test_fsdp_fine_tune_1.1_200ce5473d48270d_.log) 2025-12-04T09:18:20.5630632Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e7575131d09c7d5b.xml 2025-12-04T09:18:20.5632209Z ============================= test session starts ============================== 2025-12-04T09:18:20.5633129Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.5634037Z cachedir: .pytest_cache 2025-12-04T09:18:20.5634930Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.5635699Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.5636132Z configfile: pytest.ini 2025-12-04T09:18:20.5636844Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.5637706Z collecting ... collected 4 items 2025-12-04T09:18:20.5638114Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:18:20.5640862Z Running 4 items in this shard: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda, test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda, test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda, test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.5643111Z 2025-12-04T09:18:20.5644102Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda I1204 09:15:39.259000 31909 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 31961 2025-12-04T09:18:20.5645911Z I1204 09:15:39.260000 31909 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 31962 2025-12-04T09:18:20.5648348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.5650302Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.5652238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.5654459Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.5655770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.5657015Z return func(*args, **kwargs) 2025-12-04T09:18:20.5658186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5659403Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.5660594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5661813Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.5662965Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5664206Z seq = FSDP( 2025-12-04T09:18:20.5665287Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5666901Z seq = FSDP( 2025-12-04T09:18:20.5671415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.5676265Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.5681567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.5686660Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.5687667Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.5688817Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.5690608Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5692209Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.5694054Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5695594Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.5697102Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5698790Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5700385Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5701966Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5703567Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5705356Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.5706954Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5708450Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.5710569Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 453967872 and is now 500105216. 2025-12-04T09:18:20.5712730Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5714039Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5715995Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5717507Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5718688Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5720057Z [rank0]:E1204 09:15:45.240000 31961 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.5721181Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.5722284Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.5723901Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5725499Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.5727097Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5728590Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.5731484Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5733082Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5734909Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5736506Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5738102Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5739672Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.5741217Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5742820Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.5745038Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 342818816 and is now 391053312. 2025-12-04T09:18:20.5747270Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5748312Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5749951Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5751340Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5752434Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5753692Z [rank1]:E1204 09:15:45.240000 31962 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.5754393Z dist init r=0, world=2 2025-12-04T09:18:20.5754658Z dist init r=1, world=2 2025-12-04T09:18:20.5755856Z [rank0]:[W1204 09:15:45.707322541 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.5757091Z FAILED [7.9933s] [ 25%] 2025-12-04T09:18:20.5757252Z 2025-12-04T09:18:20.5757387Z =================================== FAILURES =================================== 2025-12-04T09:18:20.5757907Z ____________ TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda _____________ 2025-12-04T09:18:20.5758397Z Traceback (most recent call last): 2025-12-04T09:18:20.5759096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.5759799Z self._join_processes(fn) 2025-12-04T09:18:20.5760562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.5761333Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.5762105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.5762872Z raise RuntimeError(error) 2025-12-04T09:18:20.5763278Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.5763721Z Traceback (most recent call last): 2025-12-04T09:18:20.5764404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5765114Z getattr(self, test_name)() 2025-12-04T09:18:20.5765789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5766462Z fn() 2025-12-04T09:18:20.5767043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5767911Z method(*args, **kwargs) 2025-12-04T09:18:20.5768581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5769281Z method(*args, **kwargs) 2025-12-04T09:18:20.5769948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5770656Z with policy(): 2025-12-04T09:18:20.5771284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5772006Z raise RuntimeError(msg) 2025-12-04T09:18:20.5773400Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 342818816 and is now 391053312. 2025-12-04T09:18:20.5774813Z 2025-12-04T09:18:20.5775045Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5775995Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5776744Z 2025-12-04T09:18:20.5777012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5777429Z 2025-12-04T09:18:20.5777434Z 2025-12-04T09:18:20.5777659Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.5778286Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.5779743Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e7575131d09c7d5b.xml - 2025-12-04T09:18:20.5780894Z =========================== short test summary info ============================ 2025-12-04T09:18:20.5782005Z FAILED [7.9933s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.5783041Z Traceback (most recent call last): 2025-12-04T09:18:20.5783836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5784624Z getattr(self, test_name)() 2025-12-04T09:18:20.5785380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5786147Z fn() 2025-12-04T09:18:20.5786787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5787551Z method(*args, **kwargs) 2025-12-04T09:18:20.5788384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5789146Z method(*args, **kwargs) 2025-12-04T09:18:20.5789847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5790708Z with policy(): 2025-12-04T09:18:20.5791348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5792057Z raise RuntimeError(msg) 2025-12-04T09:18:20.5793314Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 342818816 and is now 391053312. 2025-12-04T09:18:20.5794521Z 2025-12-04T09:18:20.5794725Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5795637Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5796415Z 2025-12-04T09:18:20.5796663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5797177Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.5797602Z ============================== 1 failed in 8.02s =============================== 2025-12-04T09:18:20.5797957Z Got exit code 1 2025-12-04T09:18:20.5798183Z Retrying single test... 2025-12-04T09:18:20.5798958Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fea5835408d37079.xml 2025-12-04T09:18:20.5799934Z ============================= test session starts ============================== 2025-12-04T09:18:20.5800522Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.5801099Z cachedir: .pytest_cache 2025-12-04T09:18:20.5801734Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.5802436Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.5802743Z configfile: pytest.ini 2025-12-04T09:18:20.5803397Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.5804186Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.5805305Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5806174Z Running 1 items in this shard 2025-12-04T09:18:20.5806386Z 2025-12-04T09:18:20.5807377Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda I1204 09:15:51.440000 32104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32156 2025-12-04T09:18:20.5808934Z I1204 09:15:51.441000 32104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32157 2025-12-04T09:18:20.5811156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.5813053Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.5815321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.5817339Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.5818635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.5819882Z return func(*args, **kwargs) 2025-12-04T09:18:20.5821145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5822392Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.5823598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5824818Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.5826219Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5827310Z seq = FSDP( 2025-12-04T09:18:20.5828327Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5829419Z seq = FSDP( 2025-12-04T09:18:20.5833792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.5838548Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.5842988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.5847395Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.5848348Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.5849347Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.5850840Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5852301Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.5854004Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5855550Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.5857044Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5858640Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5860235Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5861825Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5863489Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5865024Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.5866710Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5868127Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.5870103Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 453967872 and is now 500105216. 2025-12-04T09:18:20.5871950Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5872981Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5874644Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5876026Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5877124Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5878431Z [rank0]:E1204 09:15:57.304000 32156 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.5879987Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.5881126Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.5882812Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5884469Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.5886108Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5887640Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.5889144Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5890734Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5892487Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5894343Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.5895945Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5897493Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.5899054Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5900662Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.5902873Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 347013120 and is now 391053312. 2025-12-04T09:18:20.5904962Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5906225Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5907977Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5909447Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.5910668Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5911987Z [rank1]:E1204 09:15:57.305000 32157 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.5912733Z dist init r=0, world=2 2025-12-04T09:18:20.5913005Z dist init r=1, world=2 2025-12-04T09:18:20.5914251Z [rank0]:[W1204 09:15:57.773738116 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.5915552Z FAILED [7.2856s] [100%] 2025-12-04T09:18:20.5915735Z 2025-12-04T09:18:20.5915876Z =================================== FAILURES =================================== 2025-12-04T09:18:20.5916425Z ____________ TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda _____________ 2025-12-04T09:18:20.5917042Z Traceback (most recent call last): 2025-12-04T09:18:20.5917743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.5918458Z self._join_processes(fn) 2025-12-04T09:18:20.5919158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.5919933Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.5920721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.5921490Z raise RuntimeError(error) 2025-12-04T09:18:20.5921883Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.5922385Z Traceback (most recent call last): 2025-12-04T09:18:20.5923089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5923801Z getattr(self, test_name)() 2025-12-04T09:18:20.5924460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5925143Z fn() 2025-12-04T09:18:20.5925719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5926385Z method(*args, **kwargs) 2025-12-04T09:18:20.5927022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5927695Z method(*args, **kwargs) 2025-12-04T09:18:20.5928332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5928993Z with policy(): 2025-12-04T09:18:20.5929606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5930289Z raise RuntimeError(msg) 2025-12-04T09:18:20.5931462Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 453967872 and is now 500105216. 2025-12-04T09:18:20.5932589Z 2025-12-04T09:18:20.5932785Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5933881Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5934619Z 2025-12-04T09:18:20.5934900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5935310Z 2025-12-04T09:18:20.5935315Z 2025-12-04T09:18:20.5935554Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.5936235Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.5937488Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fea5835408d37079.xml - 2025-12-04T09:18:20.5938649Z =========================== short test summary info ============================ 2025-12-04T09:18:20.5939757Z FAILED [7.2856s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.5940773Z Traceback (most recent call last): 2025-12-04T09:18:20.5941565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.5942375Z getattr(self, test_name)() 2025-12-04T09:18:20.5943120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.5958475Z fn() 2025-12-04T09:18:20.5959195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5959894Z method(*args, **kwargs) 2025-12-04T09:18:20.5960546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.5961214Z method(*args, **kwargs) 2025-12-04T09:18:20.5961852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.5962525Z with policy(): 2025-12-04T09:18:20.5963135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.5963939Z raise RuntimeError(msg) 2025-12-04T09:18:20.5965135Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 453967872 and is now 500105216. 2025-12-04T09:18:20.5966270Z 2025-12-04T09:18:20.5966468Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.5967325Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5967982Z 2025-12-04T09:18:20.5968221Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.5968753Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.5969203Z ======================= 1 failed, 3 deselected in 7.31s ======================== 2025-12-04T09:18:20.5969589Z Got exit code 1 2025-12-04T09:18:20.5969828Z Retrying single test... 2025-12-04T09:18:20.5970610Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-3b87fcc1c5f1359f.xml 2025-12-04T09:18:20.5971488Z ============================= test session starts ============================== 2025-12-04T09:18:20.5972066Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.5972603Z cachedir: .pytest_cache 2025-12-04T09:18:20.5973331Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.5974253Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.5974662Z configfile: pytest.ini 2025-12-04T09:18:20.5975393Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.5976275Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.5977379Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.5978312Z Running 1 items in this shard 2025-12-04T09:18:20.5978537Z 2025-12-04T09:18:20.5979734Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda I1204 09:16:03.210000 32299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32351 2025-12-04T09:18:20.5981349Z I1204 09:16:03.211000 32299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32352 2025-12-04T09:18:20.5983698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.5985706Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.5987712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.5989725Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.5991193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.5992420Z return func(*args, **kwargs) 2025-12-04T09:18:20.5993467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5994554Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.5995622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5996701Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.5997726Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.5998771Z seq = FSDP( 2025-12-04T09:18:20.5999739Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6000782Z seq = FSDP( 2025-12-04T09:18:20.6004948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6009439Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6014173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6019151Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6020160Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6021293Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6022981Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6024696Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6026394Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6027755Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6029078Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6030498Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6031920Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6033344Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6034769Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6036131Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6037505Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6038929Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6040974Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 453967872 and is now 500105216. 2025-12-04T09:18:20.6042833Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6043860Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6045507Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.6046897Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6047989Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6049226Z [rank0]:E1204 09:16:09.132000 32351 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.6050248Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6051252Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6052829Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6054622Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6056252Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6057786Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6059298Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6060895Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6062505Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6064084Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6065781Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6067290Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6068677Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6070155Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6072116Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 342818816 and is now 391053312. 2025-12-04T09:18:20.6073963Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6075002Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6076662Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.6078052Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6079463Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6081042Z [rank1]:E1204 09:16:09.133000 32352 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.6081840Z dist init r=1, world=2 2025-12-04T09:18:20.6082131Z dist init r=0, world=2 2025-12-04T09:18:20.6083458Z [rank0]:[W1204 09:16:09.630978166 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.6084959Z FAILED [7.3195s] [100%] 2025-12-04T09:18:20.6085137Z 2025-12-04T09:18:20.6085301Z =================================== FAILURES =================================== 2025-12-04T09:18:20.6085886Z ____________ TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda _____________ 2025-12-04T09:18:20.6086424Z Traceback (most recent call last): 2025-12-04T09:18:20.6087215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.6088013Z self._join_processes(fn) 2025-12-04T09:18:20.6088801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.6089677Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.6090565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.6091434Z raise RuntimeError(error) 2025-12-04T09:18:20.6092053Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.6092499Z Traceback (most recent call last): 2025-12-04T09:18:20.6093201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6094167Z getattr(self, test_name)() 2025-12-04T09:18:20.6094923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6095697Z fn() 2025-12-04T09:18:20.6096344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6097093Z method(*args, **kwargs) 2025-12-04T09:18:20.6097807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6098565Z method(*args, **kwargs) 2025-12-04T09:18:20.6099350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6100109Z with policy(): 2025-12-04T09:18:20.6100797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6101569Z raise RuntimeError(msg) 2025-12-04T09:18:20.6102891Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 342818816 and is now 391053312. 2025-12-04T09:18:20.6104150Z 2025-12-04T09:18:20.6104373Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6105327Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.6106258Z 2025-12-04T09:18:20.6106506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6106865Z 2025-12-04T09:18:20.6106869Z 2025-12-04T09:18:20.6107082Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.6107630Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.6108742Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-3b87fcc1c5f1359f.xml - 2025-12-04T09:18:20.6109770Z =========================== short test summary info ============================ 2025-12-04T09:18:20.6110738Z FAILED [7.3195s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.6111715Z Traceback (most recent call last): 2025-12-04T09:18:20.6112426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6113143Z getattr(self, test_name)() 2025-12-04T09:18:20.6113803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6114492Z fn() 2025-12-04T09:18:20.6115066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6115739Z method(*args, **kwargs) 2025-12-04T09:18:20.6116361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6117040Z method(*args, **kwargs) 2025-12-04T09:18:20.6117673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6118325Z with policy(): 2025-12-04T09:18:20.6118935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6119623Z raise RuntimeError(msg) 2025-12-04T09:18:20.6120800Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 342818816 and is now 391053312. 2025-12-04T09:18:20.6121914Z 2025-12-04T09:18:20.6122106Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6122952Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.6123624Z 2025-12-04T09:18:20.6123859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6124383Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.6124882Z ======================= 1 failed, 3 deselected in 7.34s ======================== 2025-12-04T09:18:20.6125261Z Got exit code 1 2025-12-04T09:18:20.6125884Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda 2025-12-04T09:18:20.6126836Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:18:20.6127911Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2c3852776dc4d6af.xml 2025-12-04T09:18:20.6128780Z ============================= test session starts ============================== 2025-12-04T09:18:20.6129372Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.6129908Z cachedir: .pytest_cache 2025-12-04T09:18:20.6130534Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.6131228Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.6131543Z configfile: pytest.ini 2025-12-04T09:18:20.6132179Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.6132962Z collecting ... collected 4 items / 1 deselected / 3 selected 2025-12-04T09:18:20.6133472Z stepcurrent: skipping 1 already run items. 2025-12-04T09:18:20.6133991Z Running 3 items in this shard 2025-12-04T09:18:20.6134216Z 2025-12-04T09:18:20.6135199Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda I1204 09:16:14.980000 32494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32546 2025-12-04T09:18:20.6136895Z I1204 09:16:14.981000 32494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32547 2025-12-04T09:18:20.6139265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6141280Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6143291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6145286Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6146721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.6147827Z return func(*args, **kwargs) 2025-12-04T09:18:20.6148887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6149953Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6151010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6152090Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6153184Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6154216Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6155194Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6156236Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6160371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6164805Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6169256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6173977Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6174982Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6176128Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6177805Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6179660Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6181310Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6182839Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6184461Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6186046Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6187655Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6189249Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6190946Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6192466Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6194015Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6195569Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6197708Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6199793Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6200932Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6202731Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6204233Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6205512Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6206841Z [rank0]:E1204 09:16:24.215000 32546 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.6207926Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6208993Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6210572Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6212112Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6213900Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6215438Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6217022Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6218609Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6220190Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6221779Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6223376Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6224935Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6226615Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6228034Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6230004Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 1. CUDA driver allocated memory was 342818816 and is now 372178944. 2025-12-04T09:18:20.6231902Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6232953Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6234593Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6235973Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6237066Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6238324Z [rank1]:E1204 09:16:24.216000 32547 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.6239033Z dist init r=0, world=2 2025-12-04T09:18:20.6239280Z dist init r=1, world=2 2025-12-04T09:18:20.6240471Z [rank0]:[W1204 09:16:24.632844723 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.6241708Z FAILED [10.8095s] [ 33%] 2025-12-04T09:18:20.6241874Z 2025-12-04T09:18:20.6242021Z =================================== FAILURES =================================== 2025-12-04T09:18:20.6242530Z _____________ TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda _____________ 2025-12-04T09:18:20.6243024Z Traceback (most recent call last): 2025-12-04T09:18:20.6243724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.6244473Z self._join_processes(fn) 2025-12-04T09:18:20.6245189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.6246209Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.6247043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.6247885Z raise RuntimeError(error) 2025-12-04T09:18:20.6248317Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6248790Z Traceback (most recent call last): 2025-12-04T09:18:20.6249512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6250264Z getattr(self, test_name)() 2025-12-04T09:18:20.6250968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6251691Z fn() 2025-12-04T09:18:20.6252284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6252994Z method(*args, **kwargs) 2025-12-04T09:18:20.6253899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6254703Z method(*args, **kwargs) 2025-12-04T09:18:20.6255420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6256169Z with policy(): 2025-12-04T09:18:20.6256851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6257685Z raise RuntimeError(msg) 2025-12-04T09:18:20.6259028Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6260296Z 2025-12-04T09:18:20.6260515Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6261474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6262199Z 2025-12-04T09:18:20.6262464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6262882Z 2025-12-04T09:18:20.6262886Z 2025-12-04T09:18:20.6263109Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.6263740Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.6264994Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2c3852776dc4d6af.xml - 2025-12-04T09:18:20.6266319Z =========================== short test summary info ============================ 2025-12-04T09:18:20.6267294Z FAILED [10.8095s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6268209Z Traceback (most recent call last): 2025-12-04T09:18:20.6268913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6269610Z getattr(self, test_name)() 2025-12-04T09:18:20.6270286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6270982Z fn() 2025-12-04T09:18:20.6271565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6272287Z method(*args, **kwargs) 2025-12-04T09:18:20.6272929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6273609Z method(*args, **kwargs) 2025-12-04T09:18:20.6274226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6274889Z with policy(): 2025-12-04T09:18:20.6275499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6276187Z raise RuntimeError(msg) 2025-12-04T09:18:20.6277357Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6278479Z 2025-12-04T09:18:20.6278816Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6279920Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6280647Z 2025-12-04T09:18:20.6280929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6281507Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.6282012Z ======================= 1 failed, 1 deselected in 10.83s ======================= 2025-12-04T09:18:20.6282436Z Got exit code 1 2025-12-04T09:18:20.6282708Z Retrying single test... 2025-12-04T09:18:20.6283557Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9e3a89401a26a2c7.xml 2025-12-04T09:18:20.6284646Z ============================= test session starts ============================== 2025-12-04T09:18:20.6285316Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.6285905Z cachedir: .pytest_cache 2025-12-04T09:18:20.6286616Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.6287396Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.6287752Z configfile: pytest.ini 2025-12-04T09:18:20.6288465Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.6289350Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.6290375Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6291412Z Running 1 items in this shard 2025-12-04T09:18:20.6291622Z 2025-12-04T09:18:20.6294992Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda I1204 09:16:30.190000 32689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32741 2025-12-04T09:18:20.6296623Z I1204 09:16:30.190000 32689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32742 2025-12-04T09:18:20.6298980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6300999Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6303134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6305128Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6306573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.6307689Z return func(*args, **kwargs) 2025-12-04T09:18:20.6308744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6309815Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6310877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6311955Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6312990Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6314023Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6314994Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6316030Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6320222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6324662Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6329109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6333848Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6334843Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6335989Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6337683Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6339333Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6340988Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6342511Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6344021Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6345729Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6347289Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6348763Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6350176Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6351553Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6352938Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6354362Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6356327Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6358166Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6359209Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6360863Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6362240Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6363385Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6364615Z [rank0]:E1204 09:16:39.509000 32741 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.6365613Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6366593Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6368066Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6369507Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6370951Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6372286Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6373840Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6375415Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6376987Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6378818Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6380401Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6381933Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6383456Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6385027Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6387228Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 1. CUDA driver allocated memory was 342818816 and is now 372178944. 2025-12-04T09:18:20.6389294Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6390550Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6392316Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6393763Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6394833Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6396063Z [rank1]:E1204 09:16:39.510000 32742 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.6396772Z dist init r=0, world=2 2025-12-04T09:18:20.6397003Z dist init r=1, world=2 2025-12-04T09:18:20.6398172Z [rank0]:[W1204 09:16:39.979493943 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.6399399Z FAILED [10.8065s] [100%] 2025-12-04T09:18:20.6399559Z 2025-12-04T09:18:20.6399690Z =================================== FAILURES =================================== 2025-12-04T09:18:20.6400185Z _____________ TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda _____________ 2025-12-04T09:18:20.6400656Z Traceback (most recent call last): 2025-12-04T09:18:20.6401508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.6402240Z self._join_processes(fn) 2025-12-04T09:18:20.6402983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.6403789Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.6404612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.6405478Z raise RuntimeError(error) 2025-12-04T09:18:20.6405893Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6406346Z Traceback (most recent call last): 2025-12-04T09:18:20.6407065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6407804Z getattr(self, test_name)() 2025-12-04T09:18:20.6408495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6409206Z fn() 2025-12-04T09:18:20.6409790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6410489Z method(*args, **kwargs) 2025-12-04T09:18:20.6411150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6411844Z method(*args, **kwargs) 2025-12-04T09:18:20.6412496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6413192Z with policy(): 2025-12-04T09:18:20.6414251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6415003Z raise RuntimeError(msg) 2025-12-04T09:18:20.6416325Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6417578Z 2025-12-04T09:18:20.6417795Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6418737Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6419476Z 2025-12-04T09:18:20.6419742Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6420151Z 2025-12-04T09:18:20.6420156Z 2025-12-04T09:18:20.6420448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.6421066Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.6422308Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9e3a89401a26a2c7.xml - 2025-12-04T09:18:20.6423436Z =========================== short test summary info ============================ 2025-12-04T09:18:20.6424520Z FAILED [10.8065s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6425536Z Traceback (most recent call last): 2025-12-04T09:18:20.6426534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6427339Z getattr(self, test_name)() 2025-12-04T09:18:20.6428004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6428677Z fn() 2025-12-04T09:18:20.6429242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6429899Z method(*args, **kwargs) 2025-12-04T09:18:20.6430521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6431179Z method(*args, **kwargs) 2025-12-04T09:18:20.6431795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6432447Z with policy(): 2025-12-04T09:18:20.6433093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6433764Z raise RuntimeError(msg) 2025-12-04T09:18:20.6434923Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6436039Z 2025-12-04T09:18:20.6436226Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6437062Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6437708Z 2025-12-04T09:18:20.6437959Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6438468Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.6438912Z ======================= 1 failed, 3 deselected in 10.83s ======================= 2025-12-04T09:18:20.6439278Z Got exit code 1 2025-12-04T09:18:20.6439516Z Retrying single test... 2025-12-04T09:18:20.6440279Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-dadce7936b268df6.xml 2025-12-04T09:18:20.6441145Z ============================= test session starts ============================== 2025-12-04T09:18:20.6441718Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.6442235Z cachedir: .pytest_cache 2025-12-04T09:18:20.6442858Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.6443543Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.6443848Z configfile: pytest.ini 2025-12-04T09:18:20.6444477Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.6445255Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.6446211Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6447024Z Running 1 items in this shard 2025-12-04T09:18:20.6447209Z 2025-12-04T09:18:20.6448072Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda I1204 09:16:45.389000 32884 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32936 2025-12-04T09:18:20.6449493Z I1204 09:16:45.390000 32884 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32937 2025-12-04T09:18:20.6451581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6453419Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6455550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6457537Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6458824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.6460150Z return func(*args, **kwargs) 2025-12-04T09:18:20.6461335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6462528Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6463705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6464898Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6466140Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6467166Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6468127Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6469160Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6473356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6477761Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6482930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6487908Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6488899Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6490024Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6491773Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6493478Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6495273Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6496807Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6498373Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6499942Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6501544Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6503124Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6504705Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6506353Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6507844Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6510152Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6512356Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6514301Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6515395Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6517130Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6518637Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6519878Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6521121Z [rank0]:E1204 09:16:54.648000 32936 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.6522316Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6523360Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6524988Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6526536Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6528074Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6529597Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6530929Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6532345Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6534017Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6535602Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6537180Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6538713Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6540331Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6541935Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6544132Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 1. CUDA driver allocated memory was 347013120 and is now 372178944. 2025-12-04T09:18:20.6546359Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6547390Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6549024Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6550388Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6551464Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6552693Z [rank1]:E1204 09:16:54.650000 32937 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.6553390Z dist init r=0, world=2 2025-12-04T09:18:20.6553688Z dist init r=1, world=2 2025-12-04T09:18:20.6554876Z [rank0]:[W1204 09:16:55.112533198 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.6556097Z FAILED [10.8106s] [100%] 2025-12-04T09:18:20.6556264Z 2025-12-04T09:18:20.6556396Z =================================== FAILURES =================================== 2025-12-04T09:18:20.6556899Z _____________ TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda _____________ 2025-12-04T09:18:20.6557367Z Traceback (most recent call last): 2025-12-04T09:18:20.6558060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.6558757Z self._join_processes(fn) 2025-12-04T09:18:20.6559469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.6560237Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.6561031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.6561802Z raise RuntimeError(error) 2025-12-04T09:18:20.6562210Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6562646Z Traceback (most recent call last): 2025-12-04T09:18:20.6563346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6564056Z getattr(self, test_name)() 2025-12-04T09:18:20.6564719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6565414Z fn() 2025-12-04T09:18:20.6565999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6566683Z method(*args, **kwargs) 2025-12-04T09:18:20.6567363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6568039Z method(*args, **kwargs) 2025-12-04T09:18:20.6568677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6569337Z with policy(): 2025-12-04T09:18:20.6569947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6570635Z raise RuntimeError(msg) 2025-12-04T09:18:20.6571827Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6572946Z 2025-12-04T09:18:20.6573139Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6574346Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6575095Z 2025-12-04T09:18:20.6575363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6575766Z 2025-12-04T09:18:20.6575772Z 2025-12-04T09:18:20.6576013Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.6576626Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.6577883Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-dadce7936b268df6.xml - 2025-12-04T09:18:20.6579243Z =========================== short test summary info ============================ 2025-12-04T09:18:20.6580481Z FAILED [10.8106s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6581514Z Traceback (most recent call last): 2025-12-04T09:18:20.6582296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6583101Z getattr(self, test_name)() 2025-12-04T09:18:20.6583858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6584624Z fn() 2025-12-04T09:18:20.6585270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6586025Z method(*args, **kwargs) 2025-12-04T09:18:20.6586735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6587480Z method(*args, **kwargs) 2025-12-04T09:18:20.6588195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6588953Z with policy(): 2025-12-04T09:18:20.6589618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6590385Z raise RuntimeError(msg) 2025-12-04T09:18:20.6591696Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 453967872 and is now 481230848. 2025-12-04T09:18:20.6592804Z 2025-12-04T09:18:20.6593006Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6593838Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6594505Z 2025-12-04T09:18:20.6594741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6595341Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.6595792Z ======================= 1 failed, 3 deselected in 10.83s ======================= 2025-12-04T09:18:20.6596161Z Got exit code 1 2025-12-04T09:18:20.6596782Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda 2025-12-04T09:18:20.6597732Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:18:20.6598819Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fc03e360104e794a.xml 2025-12-04T09:18:20.6599679Z ============================= test session starts ============================== 2025-12-04T09:18:20.6600276Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.6600806Z cachedir: .pytest_cache 2025-12-04T09:18:20.6601431Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.6602125Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.6602442Z configfile: pytest.ini 2025-12-04T09:18:20.6603285Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.6604098Z collecting ... collected 4 items / 2 deselected / 2 selected 2025-12-04T09:18:20.6604599Z stepcurrent: skipping 2 already run items. 2025-12-04T09:18:20.6604963Z Running 2 items in this shard 2025-12-04T09:18:20.6605160Z 2025-12-04T09:18:20.6606070Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda I1204 09:17:00.669000 33079 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33131 2025-12-04T09:18:20.6607607Z I1204 09:17:00.670000 33079 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33132 2025-12-04T09:18:20.6609817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6611712Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6613840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6615862Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6617147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.6618392Z return func(*args, **kwargs) 2025-12-04T09:18:20.6619576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6620793Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6621971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6623186Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6624412Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6625686Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6626780Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6627818Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6631960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6636362Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6640820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6645300Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6646192Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6647209Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6648706Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6650172Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6651618Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6652981Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6654712Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6656308Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6657889Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6659479Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6661077Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6662638Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6664200Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6665792Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6667965Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 453967872 and is now 479133696. 2025-12-04T09:18:20.6669824Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6670876Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6672498Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6673848Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6674944Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6676203Z [rank0]:E1204 09:17:06.429000 33131 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.6677232Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6678246Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6680138Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6681791Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6683438Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6685104Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6686605Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6688198Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6689799Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6691495Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6692927Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6694601Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6696160Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6697767Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6699941Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 16384 on device 1. CUDA driver allocated memory was 342818816 and is now 370081792. 2025-12-04T09:18:20.6702071Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6703228Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6705057Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6706612Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6707700Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6708950Z [rank1]:E1204 09:17:06.431000 33132 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.6709650Z dist init r=0, world=2 2025-12-04T09:18:20.6709906Z dist init r=1, world=2 2025-12-04T09:18:20.6711100Z [rank0]:[W1204 09:17:06.894260514 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.6712335Z FAILED [7.3276s] [ 50%] 2025-12-04T09:18:20.6712497Z 2025-12-04T09:18:20.6712631Z =================================== FAILURES =================================== 2025-12-04T09:18:20.6713148Z ________________ TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda ________________ 2025-12-04T09:18:20.6713631Z Traceback (most recent call last): 2025-12-04T09:18:20.6714369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.6715079Z self._join_processes(fn) 2025-12-04T09:18:20.6715794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.6716572Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.6717343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.6718290Z raise RuntimeError(error) 2025-12-04T09:18:20.6718719Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6719174Z Traceback (most recent call last): 2025-12-04T09:18:20.6719917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6720848Z getattr(self, test_name)() 2025-12-04T09:18:20.6721586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6722328Z fn() 2025-12-04T09:18:20.6722956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6723691Z method(*args, **kwargs) 2025-12-04T09:18:20.6724371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6725103Z method(*args, **kwargs) 2025-12-04T09:18:20.6725793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6726519Z with policy(): 2025-12-04T09:18:20.6727224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6727969Z raise RuntimeError(msg) 2025-12-04T09:18:20.6729221Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 453967872 and is now 479133696. 2025-12-04T09:18:20.6730493Z 2025-12-04T09:18:20.6730710Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6731567Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6732240Z 2025-12-04T09:18:20.6732489Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6732878Z 2025-12-04T09:18:20.6732882Z 2025-12-04T09:18:20.6733091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.6733931Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.6735217Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fc03e360104e794a.xml - 2025-12-04T09:18:20.6736374Z =========================== short test summary info ============================ 2025-12-04T09:18:20.6737435Z FAILED [7.3276s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.6738435Z Traceback (most recent call last): 2025-12-04T09:18:20.6739215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6740016Z getattr(self, test_name)() 2025-12-04T09:18:20.6740771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6741548Z fn() 2025-12-04T09:18:20.6742248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6743010Z method(*args, **kwargs) 2025-12-04T09:18:20.6743723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6744467Z method(*args, **kwargs) 2025-12-04T09:18:20.6745177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6746131Z with policy(): 2025-12-04T09:18:20.6746743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6747411Z raise RuntimeError(msg) 2025-12-04T09:18:20.6748564Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 453967872 and is now 479133696. 2025-12-04T09:18:20.6749664Z 2025-12-04T09:18:20.6749863Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6750685Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6751309Z 2025-12-04T09:18:20.6751551Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6752077Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.6752524Z ======================= 1 failed, 2 deselected in 7.35s ======================== 2025-12-04T09:18:20.6752901Z Got exit code 1 2025-12-04T09:18:20.6753132Z Retrying single test... 2025-12-04T09:18:20.6753900Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2239b5ce820a8e80.xml 2025-12-04T09:18:20.6754842Z ============================= test session starts ============================== 2025-12-04T09:18:20.6755419Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.6755955Z cachedir: .pytest_cache 2025-12-04T09:18:20.6756590Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.6757283Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.6757588Z configfile: pytest.ini 2025-12-04T09:18:20.6758227Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.6758999Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.6759864Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda 2025-12-04T09:18:20.6760666Z Running 1 items in this shard 2025-12-04T09:18:20.6760863Z 2025-12-04T09:18:20.6761702Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda I1204 09:17:12.460000 33274 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33326 2025-12-04T09:18:20.6763097Z I1204 09:17:12.461000 33274 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33327 2025-12-04T09:18:20.6765186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6766962Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6768790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6770555Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6771699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.6772802Z return func(*args, **kwargs) 2025-12-04T09:18:20.6774112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6775326Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6776514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6777721Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6779083Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6780261Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6781349Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6782626Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6787252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6792138Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6796585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6801065Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6801942Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6802937Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6804420Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6805872Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6807335Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6808676Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6810002Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6811399Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6812804Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6814564Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6816157Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6817682Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6819216Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6820801Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6822972Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 16384 on device 1. CUDA driver allocated memory was 347013120 and is now 370081792. 2025-12-04T09:18:20.6824992Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6826321Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6827928Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6829270Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6830412Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6831649Z [rank1]:E1204 09:17:18.181000 33327 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.6832660Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6833658Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6835136Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6836834Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6838363Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6839795Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6841201Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6842683Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6844272Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6845949Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6847477Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6848958Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6850455Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6851993Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6854341Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 453967872 and is now 479133696. 2025-12-04T09:18:20.6856373Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6857542Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6859358Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6860942Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6862155Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6863599Z [rank0]:E1204 09:17:18.187000 33326 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.6864384Z dist init r=1, world=2 2025-12-04T09:18:20.6864657Z dist init r=0, world=2 2025-12-04T09:18:20.6866206Z [rank0]:[W1204 09:17:18.663379913 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.6867508Z FAILED [7.2308s] [100%] 2025-12-04T09:18:20.6867671Z 2025-12-04T09:18:20.6867821Z =================================== FAILURES =================================== 2025-12-04T09:18:20.6868344Z ________________ TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda ________________ 2025-12-04T09:18:20.6868842Z Traceback (most recent call last): 2025-12-04T09:18:20.6869576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.6870317Z self._join_processes(fn) 2025-12-04T09:18:20.6871058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.6872039Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.6872890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.6873786Z raise RuntimeError(error) 2025-12-04T09:18:20.6874214Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.6874691Z Traceback (most recent call last): 2025-12-04T09:18:20.6875701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6876451Z getattr(self, test_name)() 2025-12-04T09:18:20.6877174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6877910Z fn() 2025-12-04T09:18:20.6878526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6879567Z method(*args, **kwargs) 2025-12-04T09:18:20.6880278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6881031Z method(*args, **kwargs) 2025-12-04T09:18:20.6881719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6882464Z with policy(): 2025-12-04T09:18:20.6883144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6883904Z raise RuntimeError(msg) 2025-12-04T09:18:20.6885186Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 16384 on device 1. CUDA driver allocated memory was 347013120 and is now 370081792. 2025-12-04T09:18:20.6886407Z 2025-12-04T09:18:20.6886621Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6887537Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6888250Z 2025-12-04T09:18:20.6888519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6888917Z 2025-12-04T09:18:20.6888922Z 2025-12-04T09:18:20.6889246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.6889865Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.6891103Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2239b5ce820a8e80.xml - 2025-12-04T09:18:20.6892339Z =========================== short test summary info ============================ 2025-12-04T09:18:20.6893432Z FAILED [7.2308s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.6894576Z Traceback (most recent call last): 2025-12-04T09:18:20.6895367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6896160Z getattr(self, test_name)() 2025-12-04T09:18:20.6896906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6897668Z fn() 2025-12-04T09:18:20.6898311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6899064Z method(*args, **kwargs) 2025-12-04T09:18:20.6899764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6900508Z method(*args, **kwargs) 2025-12-04T09:18:20.6901218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6901957Z with policy(): 2025-12-04T09:18:20.6902740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6903505Z raise RuntimeError(msg) 2025-12-04T09:18:20.6904809Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 16384 on device 1. CUDA driver allocated memory was 347013120 and is now 370081792. 2025-12-04T09:18:20.6906227Z 2025-12-04T09:18:20.6906430Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6907296Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.6907963Z 2025-12-04T09:18:20.6908214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.6908767Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.6909232Z ======================= 1 failed, 3 deselected in 7.25s ======================== 2025-12-04T09:18:20.6909626Z Got exit code 1 2025-12-04T09:18:20.6909881Z Retrying single test... 2025-12-04T09:18:20.6910680Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-750660f185e24025.xml 2025-12-04T09:18:20.6911598Z ============================= test session starts ============================== 2025-12-04T09:18:20.6912218Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.6912779Z cachedir: .pytest_cache 2025-12-04T09:18:20.6926560Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.6927433Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.6927752Z configfile: pytest.ini 2025-12-04T09:18:20.6928394Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.6929186Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.6930192Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda 2025-12-04T09:18:20.6930991Z Running 1 items in this shard 2025-12-04T09:18:20.6931179Z 2025-12-04T09:18:20.6932020Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda I1204 09:17:24.100000 33469 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33521 2025-12-04T09:18:20.6933511Z I1204 09:17:24.101000 33469 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33522 2025-12-04T09:18:20.6936005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6938016Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6940015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.6942006Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.6943281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.6944591Z return func(*args, **kwargs) 2025-12-04T09:18:20.6945770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6946965Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6948008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6949084Z return fsdp_fn(module, **kwargs) 2025-12-04T09:18:20.6950119Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6951350Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6952363Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:18:20.6953506Z fsdp_seq = FSDP( 2025-12-04T09:18:20.6957915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6962645Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6967081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:18:20.6971492Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:18:20.6972367Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.6973423Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.6975230Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.6976987Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.6978797Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.6980328Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.6981830Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6983407Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6984986Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.6986567Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.6988147Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.6989679Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.6991365Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.6992883Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.6994811Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 453967872 and is now 479133696. 2025-12-04T09:18:20.6996614Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.6997648Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.6999267Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.7000607Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7001687Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7002930Z [rank0]:E1204 09:17:29.878000 33521 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.7003940Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.7004926Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.7006470Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7007919Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.7009362Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7010712Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.7012033Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7013502Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7015247Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7016826Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7018410Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7019934Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.7021559Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7023148Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.7025317Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 342818816 and is now 370081792. 2025-12-04T09:18:20.7027391Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7028408Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7030028Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.7031372Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7032466Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7033702Z [rank1]:E1204 09:17:29.879000 33522 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.7034396Z dist init r=0, world=2 2025-12-04T09:18:20.7034645Z dist init r=1, world=2 2025-12-04T09:18:20.7035898Z [rank0]:[W1204 09:17:30.348628138 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.7037120Z FAILED [7.2801s] [100%] 2025-12-04T09:18:20.7037275Z 2025-12-04T09:18:20.7037405Z =================================== FAILURES =================================== 2025-12-04T09:18:20.7037908Z ________________ TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda ________________ 2025-12-04T09:18:20.7038375Z Traceback (most recent call last): 2025-12-04T09:18:20.7039062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.7039760Z self._join_processes(fn) 2025-12-04T09:18:20.7040468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.7041241Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.7042011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.7042766Z raise RuntimeError(error) 2025-12-04T09:18:20.7043158Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.7043586Z Traceback (most recent call last): 2025-12-04T09:18:20.7044276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7044977Z getattr(self, test_name)() 2025-12-04T09:18:20.7045632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7046294Z fn() 2025-12-04T09:18:20.7046859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7047524Z method(*args, **kwargs) 2025-12-04T09:18:20.7048141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7048856Z method(*args, **kwargs) 2025-12-04T09:18:20.7049483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7050141Z with policy(): 2025-12-04T09:18:20.7050726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7051406Z raise RuntimeError(msg) 2025-12-04T09:18:20.7052547Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 453967872 and is now 479133696. 2025-12-04T09:18:20.7053861Z 2025-12-04T09:18:20.7054089Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7054996Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.7055703Z 2025-12-04T09:18:20.7055965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7056372Z 2025-12-04T09:18:20.7056378Z 2025-12-04T09:18:20.7056601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.7057218Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.7058448Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-750660f185e24025.xml - 2025-12-04T09:18:20.7059591Z =========================== short test summary info ============================ 2025-12-04T09:18:20.7060707Z FAILED [7.2801s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.7061682Z Traceback (most recent call last): 2025-12-04T09:18:20.7062456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7063242Z getattr(self, test_name)() 2025-12-04T09:18:20.7063988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7064750Z fn() 2025-12-04T09:18:20.7065379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7066306Z method(*args, **kwargs) 2025-12-04T09:18:20.7066931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7067587Z method(*args, **kwargs) 2025-12-04T09:18:20.7068207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7068859Z with policy(): 2025-12-04T09:18:20.7069455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7070119Z raise RuntimeError(msg) 2025-12-04T09:18:20.7071269Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 453967872 and is now 479133696. 2025-12-04T09:18:20.7072347Z 2025-12-04T09:18:20.7072536Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7073348Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda 2025-12-04T09:18:20.7073967Z 2025-12-04T09:18:20.7074200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7074719Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.7075211Z ======================= 1 failed, 3 deselected in 7.30s ======================== 2025-12-04T09:18:20.7075578Z Got exit code 1 2025-12-04T09:18:20.7076139Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda 2025-12-04T09:18:20.7077043Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:18:20.7078119Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-8a47980366c1ac84.xml 2025-12-04T09:18:20.7079296Z ============================= test session starts ============================== 2025-12-04T09:18:20.7079947Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.7080537Z cachedir: .pytest_cache 2025-12-04T09:18:20.7081237Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.7081994Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.7082337Z configfile: pytest.ini 2025-12-04T09:18:20.7083050Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.7083924Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.7084388Z stepcurrent: skipping 3 already run items. 2025-12-04T09:18:20.7084763Z Running 1 items in this shard 2025-12-04T09:18:20.7084967Z 2025-12-04T09:18:20.7085981Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda I1204 09:17:35.890000 33664 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33716 2025-12-04T09:18:20.7087695Z I1204 09:17:35.891000 33664 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33717 2025-12-04T09:18:20.7090040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.7092066Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.7094100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.7096094Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.7097386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.7098609Z return func(*args, **kwargs) 2025-12-04T09:18:20.7099276Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.7100402Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.7102070Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7103709Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.7105425Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7106938Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.7108517Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7110003Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7111491Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7112973Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7114466Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7116098Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.7117591Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7119186Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.7121353Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 453967872 and is now 493813760. 2025-12-04T09:18:20.7123383Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7124508Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7126328Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7127858Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7129047Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7130403Z [rank0]:E1204 09:17:44.660000 33716 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.7131704Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.7132718Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.7134608Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7136249Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.7137883Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7139407Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.7140900Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7142489Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7144079Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7145661Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7147319Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7148669Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.7150091Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7151502Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.7153473Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 342818816 and is now 384761856. 2025-12-04T09:18:20.7155314Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7156335Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7157999Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7159394Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7160470Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7161709Z [rank1]:E1204 09:17:44.669000 33717 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.7162407Z dist init r=0, world=2 2025-12-04T09:18:20.7162653Z dist init r=1, world=2 2025-12-04T09:18:20.7163907Z [rank0]:[W1204 09:17:45.136846354 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.7165118Z FAILED [10.3266s] [100%] 2025-12-04T09:18:20.7165287Z 2025-12-04T09:18:20.7165419Z =================================== FAILURES =================================== 2025-12-04T09:18:20.7165934Z __________ TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda __________ 2025-12-04T09:18:20.7166424Z Traceback (most recent call last): 2025-12-04T09:18:20.7167109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.7167807Z self._join_processes(fn) 2025-12-04T09:18:20.7168497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.7169269Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.7170050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.7170802Z raise RuntimeError(error) 2025-12-04T09:18:20.7171196Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.7171628Z Traceback (most recent call last): 2025-12-04T09:18:20.7172316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7173009Z getattr(self, test_name)() 2025-12-04T09:18:20.7173920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7174687Z fn() 2025-12-04T09:18:20.7175321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7176140Z method(*args, **kwargs) 2025-12-04T09:18:20.7176843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7177594Z method(*args, **kwargs) 2025-12-04T09:18:20.7178286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7179206Z with policy(): 2025-12-04T09:18:20.7179884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7180630Z raise RuntimeError(msg) 2025-12-04T09:18:20.7181979Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 453967872 and is now 493813760. 2025-12-04T09:18:20.7183263Z 2025-12-04T09:18:20.7183475Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7184449Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7185205Z 2025-12-04T09:18:20.7185480Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7185875Z 2025-12-04T09:18:20.7185880Z 2025-12-04T09:18:20.7186099Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.7186716Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.7187954Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-8a47980366c1ac84.xml - 2025-12-04T09:18:20.7189095Z =========================== short test summary info ============================ 2025-12-04T09:18:20.7190193Z FAILED [10.3266s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:18:20.7191382Z Traceback (most recent call last): 2025-12-04T09:18:20.7192076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7192782Z getattr(self, test_name)() 2025-12-04T09:18:20.7193435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7194110Z fn() 2025-12-04T09:18:20.7194672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7195326Z method(*args, **kwargs) 2025-12-04T09:18:20.7195946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7196618Z method(*args, **kwargs) 2025-12-04T09:18:20.7197241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7197896Z with policy(): 2025-12-04T09:18:20.7198492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7199167Z raise RuntimeError(msg) 2025-12-04T09:18:20.7200347Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 453967872 and is now 493813760. 2025-12-04T09:18:20.7201477Z 2025-12-04T09:18:20.7201668Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7202530Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7203287Z 2025-12-04T09:18:20.7203519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7204222Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.7204680Z ======================= 1 failed, 3 deselected in 10.35s ======================= 2025-12-04T09:18:20.7205071Z Got exit code 1 2025-12-04T09:18:20.7205314Z Retrying single test... 2025-12-04T09:18:20.7206150Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-ca9abf7ee24c038e.xml 2025-12-04T09:18:20.7207066Z ============================= test session starts ============================== 2025-12-04T09:18:20.7207673Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.7208225Z cachedir: .pytest_cache 2025-12-04T09:18:20.7208876Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.7209596Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.7209922Z configfile: pytest.ini 2025-12-04T09:18:20.7210585Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.7211405Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.7212386Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7213344Z Running 1 items in this shard 2025-12-04T09:18:20.7213543Z 2025-12-04T09:18:20.7214702Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda I1204 09:17:50.589000 33859 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33911 2025-12-04T09:18:20.7216327Z I1204 09:17:50.590000 33859 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33912 2025-12-04T09:18:20.7218748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.7220752Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.7222747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.7224744Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.7226213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.7227310Z return func(*args, **kwargs) 2025-12-04T09:18:20.7227904Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.7228902Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.7230370Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7231876Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.7233334Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7234683Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.7236004Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7237395Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7238807Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7240223Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7241624Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7242996Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.7244365Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7245775Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.7247817Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 453967872 and is now 493813760. 2025-12-04T09:18:20.7249665Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7250697Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7252346Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7254016Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7255243Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7256649Z [rank0]:E1204 09:17:59.229000 33911 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.7257778Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.7258902Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.7261103Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7262751Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.7264392Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7266104Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.7267433Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7268835Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7270233Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7271633Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7273025Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7274386Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.7275764Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7277225Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.7279533Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 342818816 and is now 384761856. 2025-12-04T09:18:20.7281603Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7282767Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7284651Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7286233Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7287455Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7288861Z [rank1]:E1204 09:17:59.231000 33912 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.7289643Z dist init r=1, world=2 2025-12-04T09:18:20.7289915Z dist init r=0, world=2 2025-12-04T09:18:20.7291439Z [rank0]:[W1204 09:17:59.699957900 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.7292663Z FAILED [10.4781s] [100%] 2025-12-04T09:18:20.7292833Z 2025-12-04T09:18:20.7292966Z =================================== FAILURES =================================== 2025-12-04T09:18:20.7293534Z __________ TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda __________ 2025-12-04T09:18:20.7294223Z Traceback (most recent call last): 2025-12-04T09:18:20.7294996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.7295782Z self._join_processes(fn) 2025-12-04T09:18:20.7296566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.7297424Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.7298297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.7299153Z raise RuntimeError(error) 2025-12-04T09:18:20.7299585Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.7300071Z Traceback (most recent call last): 2025-12-04T09:18:20.7300839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7301628Z getattr(self, test_name)() 2025-12-04T09:18:20.7302357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7303114Z fn() 2025-12-04T09:18:20.7303746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7304496Z method(*args, **kwargs) 2025-12-04T09:18:20.7305195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7306126Z method(*args, **kwargs) 2025-12-04T09:18:20.7306746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7307389Z with policy(): 2025-12-04T09:18:20.7307986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7308658Z raise RuntimeError(msg) 2025-12-04T09:18:20.7309837Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 342818816 and is now 384761856. 2025-12-04T09:18:20.7310969Z 2025-12-04T09:18:20.7311155Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7312029Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7312711Z 2025-12-04T09:18:20.7312946Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7313300Z 2025-12-04T09:18:20.7313304Z 2025-12-04T09:18:20.7313507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.7314043Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.7315155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-ca9abf7ee24c038e.xml - 2025-12-04T09:18:20.7316174Z =========================== short test summary info ============================ 2025-12-04T09:18:20.7317207Z FAILED [10.4781s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.7318115Z Traceback (most recent call last): 2025-12-04T09:18:20.7318812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7319506Z getattr(self, test_name)() 2025-12-04T09:18:20.7320163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7320827Z fn() 2025-12-04T09:18:20.7321394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7322069Z method(*args, **kwargs) 2025-12-04T09:18:20.7322680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7323347Z method(*args, **kwargs) 2025-12-04T09:18:20.7323962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7324625Z with policy(): 2025-12-04T09:18:20.7325210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7325872Z raise RuntimeError(msg) 2025-12-04T09:18:20.7327048Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 342818816 and is now 384761856. 2025-12-04T09:18:20.7328161Z 2025-12-04T09:18:20.7328360Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7329212Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7329901Z 2025-12-04T09:18:20.7330132Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7330698Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.7331136Z ======================= 1 failed, 3 deselected in 10.50s ======================= 2025-12-04T09:18:20.7331490Z Got exit code 1 2025-12-04T09:18:20.7331721Z Retrying single test... 2025-12-04T09:18:20.7332473Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-7eb1dc39773c41c4.xml 2025-12-04T09:18:20.7333382Z ============================= test session starts ============================== 2025-12-04T09:18:20.7334160Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.7334752Z cachedir: .pytest_cache 2025-12-04T09:18:20.7335454Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.7336211Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.7336557Z configfile: pytest.ini 2025-12-04T09:18:20.7337274Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.7338126Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:18:20.7339167Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7340115Z Running 1 items in this shard 2025-12-04T09:18:20.7340319Z 2025-12-04T09:18:20.7341320Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda I1204 09:18:05.319000 34054 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 34106 2025-12-04T09:18:20.7343026Z I1204 09:18:05.320000 34054 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 34107 2025-12-04T09:18:20.7345366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.7347313Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.7349079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:18:20.7350855Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:18:20.7352005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:18:20.7353092Z return func(*args, **kwargs) 2025-12-04T09:18:20.7353685Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.7354688Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.7356165Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7357617Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.7359116Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7360467Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.7361787Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7363191Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7364595Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7365991Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7367396Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7368757Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.7370295Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7371832Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.7374180Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 453967872 and is now 493813760. 2025-12-04T09:18:20.7376269Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7377431Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7379588Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7381173Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7382385Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7383781Z [rank0]:E1204 09:18:13.757000 34106 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:18:20.7384914Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:18:20.7386020Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:18:20.7387696Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7389441Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:18:20.7390429Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7390941Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:18:20.7392048Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7392490Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7393339Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7393779Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:18:20.7394621Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7395012Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:18:20.7395943Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7396417Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:18:20.7397842Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 342818816 and is now 384761856. 2025-12-04T09:18:20.7398162Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7398749Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7399710Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7400032Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:18:20.7400856Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7401368Z [rank1]:E1204 09:18:13.758000 34107 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:18:20.7401477Z dist init r=0, world=2 2025-12-04T09:18:20.7401569Z dist init r=1, world=2 2025-12-04T09:18:20.7402649Z [rank0]:[W1204 09:18:14.239930504 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:18:20.7402814Z FAILED [10.3910s] [100%] 2025-12-04T09:18:20.7402821Z 2025-12-04T09:18:20.7402956Z =================================== FAILURES =================================== 2025-12-04T09:18:20.7403233Z __________ TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda __________ 2025-12-04T09:18:20.7403345Z Traceback (most recent call last): 2025-12-04T09:18:20.7403857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:18:20.7403970Z self._join_processes(fn) 2025-12-04T09:18:20.7404519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:18:20.7404665Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:18:20.7405228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:18:20.7405336Z raise RuntimeError(error) 2025-12-04T09:18:20.7405566Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.7405679Z Traceback (most recent call last): 2025-12-04T09:18:20.7406181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7406299Z getattr(self, test_name)() 2025-12-04T09:18:20.7406798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7406893Z fn() 2025-12-04T09:18:20.7407368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7407563Z method(*args, **kwargs) 2025-12-04T09:18:20.7408040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7408136Z method(*args, **kwargs) 2025-12-04T09:18:20.7408620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7408713Z with policy(): 2025-12-04T09:18:20.7409187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7409297Z raise RuntimeError(msg) 2025-12-04T09:18:20.7410368Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 342818816 and is now 384761856. 2025-12-04T09:18:20.7410374Z 2025-12-04T09:18:20.7410587Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7411180Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7411190Z 2025-12-04T09:18:20.7411436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7411441Z 2025-12-04T09:18:20.7411446Z 2025-12-04T09:18:20.7411655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:18:20.7411901Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:18:20.7412704Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-7eb1dc39773c41c4.xml - 2025-12-04T09:18:20.7412860Z =========================== short test summary info ============================ 2025-12-04T09:18:20.7413838Z FAILED [10.3910s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:18:20.7413972Z Traceback (most recent call last): 2025-12-04T09:18:20.7414583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:18:20.7414699Z getattr(self, test_name)() 2025-12-04T09:18:20.7415235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:18:20.7415322Z fn() 2025-12-04T09:18:20.7415833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7415939Z method(*args, **kwargs) 2025-12-04T09:18:20.7416437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:18:20.7416556Z method(*args, **kwargs) 2025-12-04T09:18:20.7417061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:18:20.7417170Z with policy(): 2025-12-04T09:18:20.7417682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:18:20.7417788Z raise RuntimeError(msg) 2025-12-04T09:18:20.7418936Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 342818816 and is now 384761856. 2025-12-04T09:18:20.7418942Z 2025-12-04T09:18:20.7419161Z To execute this test, run the following from the base repo dir: 2025-12-04T09:18:20.7419799Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7419876Z 2025-12-04T09:18:20.7420142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:18:20.7420323Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:18:20.7420506Z ======================= 1 failed, 3 deselected in 10.41s ======================= 2025-12-04T09:18:20.7420598Z Got exit code 1 2025-12-04T09:18:20.7421162Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda 2025-12-04T09:18:20.7421565Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:18:20.7422239Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-b3cb6eb5be1f3e0c.xml 2025-12-04T09:18:20.7422407Z ============================= test session starts ============================== 2025-12-04T09:18:20.7422755Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:18:20.7422873Z cachedir: .pytest_cache 2025-12-04T09:18:20.7423391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:18:20.7423512Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:18:20.7423629Z configfile: pytest.ini 2025-12-04T09:18:20.7424163Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:18:20.7424366Z collecting ... collected 4 items / 4 deselected / 0 selected 2025-12-04T09:18:20.7424511Z stepcurrent: skipping 4 already run items. 2025-12-04T09:18:20.7424623Z Running 0 items in this shard 2025-12-04T09:18:20.7424628Z 2025-12-04T09:18:20.7425487Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-b3cb6eb5be1f3e0c.xml - 2025-12-04T09:18:20.7425656Z ============================ 4 deselected in 0.01s ============================= 2025-12-04T09:18:20.7427822Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda', 'test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda', 'test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda', 'test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda'] 2025-12-04T09:18:20.7427839Z 2025-12-04T09:18:20.7428450Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_fine_tune 1/1 (test/test-reports/distributed.fsdp.test_fsdp_fine_tune_1.1_200ce5473d48270d_.log) 2025-12-04T09:18:20.7428455Z 2025-12-04T09:18:20.7428929Z Finished distributed/fsdp/test_fsdp_fine_tune 1/1 ... [2025-12-04 09:18:20.564096][1526.666222586], took 2.75min 2025-12-04T09:18:20.7429761Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e7575131d09c7d5b.xml 2025-12-04T09:18:20.7430557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fea5835408d37079.xml 2025-12-04T09:18:20.7431356Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-3b87fcc1c5f1359f.xml 2025-12-04T09:18:20.7432142Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2c3852776dc4d6af.xml 2025-12-04T09:18:20.7432932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9e3a89401a26a2c7.xml 2025-12-04T09:18:20.7618613Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-dadce7936b268df6.xml 2025-12-04T09:18:20.7895628Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fc03e360104e794a.xml 2025-12-04T09:18:20.8196773Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2239b5ce820a8e80.xml 2025-12-04T09:18:20.8468313Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-750660f185e24025.xml 2025-12-04T09:18:20.8708072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-8a47980366c1ac84.xml 2025-12-04T09:18:20.8993278Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-ca9abf7ee24c038e.xml 2025-12-04T09:18:20.9281340Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-7eb1dc39773c41c4.xml 2025-12-04T09:18:20.9560224Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-b3cb6eb5be1f3e0c.xml 2025-12-04T09:18:21.1683134Z Uploading logs for 57116084892 to S3 2025-12-04T09:18:21.2044261Z Uploading artifacts took 0.23 seconds 2025-12-04T09:18:21.2044733Z distributed/fsdp/test_fsdp_fine_tune 1/1 failed! 2025-12-04T09:18:21.2046831Z Running distributed/fsdp/test_fsdp_dtensor_state_dict 1/1 ... [2025-12-04 09:18:21.204549][1527.306679824] 2025-12-04T09:18:21.2047547Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:18:21.2050915Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_dtensor_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:18:21.204872] 2025-12-04T09:28:36.7937528Z 2025-12-04T09:28:36.7938586Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_fsdp_dtensor_state_dict_1.1_e652baa949161530_.log) 2025-12-04T09:28:36.7940282Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-dcb6c7b6743de89e.xml 2025-12-04T09:28:36.7941374Z ============================= test session starts ============================== 2025-12-04T09:28:36.7942100Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.7942701Z cachedir: .pytest_cache 2025-12-04T09:28:36.7943432Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.7944235Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.7944600Z configfile: pytest.ini 2025-12-04T09:28:36.7945322Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.7947265Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.7948825Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.7950702Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.7952230Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.7952614Z collected 15 items 2025-12-04T09:28:36.7952971Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:28:36.7966990Z Running 15 items in this shard: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda, test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda 2025-12-04T09:28:36.7981807Z 2025-12-04T09:28:36.7983250Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:18:24.639000 34306 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 34358 2025-12-04T09:28:36.7985308Z I1204 09:18:24.640000 34306 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 34359 2025-12-04T09:28:36.7986441Z I1204 09:18:24.641000 34306 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 34360 2025-12-04T09:28:36.7987572Z I1204 09:18:24.642000 34306 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 34361 2025-12-04T09:28:36.7990771Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.7993408Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.7996004Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.7998635Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8001238Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8003855Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8006522Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8009149Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8013941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8018958Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8024002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8029051Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8066906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8072037Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8077185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8082406Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8083382Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8084464Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8086098Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8087691Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8089272Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8090832Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8092213Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8094081Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8095601Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8097123Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8098639Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8100119Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8101614Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8103135Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8105751Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:28:36.8108061Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8109131Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8111402Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8113294Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8114407Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8115696Z E1204 09:18:31.905000 34359 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.8116751Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8117790Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8119345Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8120976Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8122462Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8123895Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8125247Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8126674Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8128117Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8129562Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8131195Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8132628Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8134328Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8135864Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8138390Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:36.8140748Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8141908Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8144173Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8146333Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8147431Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8148706Z E1204 09:18:31.905000 34358 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.8149720Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8150733Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8152258Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8153757Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8155293Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8156641Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8158138Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8159579Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8161018Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8162440Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8165684Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8167089Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8168501Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8170008Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8172294Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.8174772Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8175881Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8178156Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8180316Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8181502Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8182840Z E1204 09:18:31.906000 34361 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.8183921Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8184992Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8186608Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8188176Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8189873Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8191472Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8192752Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8194118Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8195462Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8196821Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8198186Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8199513Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8200833Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8202205Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8204515Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 523173888 and is now 611254272. 2025-12-04T09:28:36.8206624Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8207614Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8209628Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8211358Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8212398Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8213826Z E1204 09:18:31.908000 34360 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.8214594Z FAILED [9.0961s] [ 6%] 2025-12-04T09:28:36.8214776Z 2025-12-04T09:28:36.8214929Z =================================== FAILURES =================================== 2025-12-04T09:28:36.8215853Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:36.8216749Z Traceback (most recent call last): 2025-12-04T09:28:36.8217622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.8218411Z self._join_processes(fn) 2025-12-04T09:28:36.8219216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.8220088Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.8220959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.8221820Z raise RuntimeError(error) 2025-12-04T09:28:36.8222279Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:36.8222776Z Traceback (most recent call last): 2025-12-04T09:28:36.8223545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8224348Z getattr(self, test_name)() 2025-12-04T09:28:36.8225101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8225923Z fn() 2025-12-04T09:28:36.8226511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8227187Z method(*args, **kwargs) 2025-12-04T09:28:36.8227823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8228484Z method(*args, **kwargs) 2025-12-04T09:28:36.8229120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8229795Z with policy(): 2025-12-04T09:28:36.8230385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8231071Z raise RuntimeError(msg) 2025-12-04T09:28:36.8232645Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:36.8234345Z 2025-12-04T09:28:36.8234550Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8235884Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8237006Z 2025-12-04T09:28:36.8237259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8237646Z 2025-12-04T09:28:36.8237650Z 2025-12-04T09:28:36.8237870Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.8238463Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.8239732Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-dcb6c7b6743de89e.xml - 2025-12-04T09:28:36.8240887Z =========================== short test summary info ============================ 2025-12-04T09:28:36.8242329Z FAILED [9.0961s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:36.8243700Z Traceback (most recent call last): 2025-12-04T09:28:36.8244613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8245450Z getattr(self, test_name)() 2025-12-04T09:28:36.8246174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8246935Z fn() 2025-12-04T09:28:36.8247567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8248293Z method(*args, **kwargs) 2025-12-04T09:28:36.8248989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8249730Z method(*args, **kwargs) 2025-12-04T09:28:36.8250413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8251125Z with policy(): 2025-12-04T09:28:36.8251779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8252519Z raise RuntimeError(msg) 2025-12-04T09:28:36.8254425Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:36.8256063Z 2025-12-04T09:28:36.8256282Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8257706Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8258913Z 2025-12-04T09:28:36.8259182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8259784Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.8260251Z ============================== 1 failed in 9.31s =============================== 2025-12-04T09:28:36.8260648Z Got exit code 1 2025-12-04T09:28:36.8260982Z Retrying single test... 2025-12-04T09:28:36.8261928Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c0871d667bd4df8d.xml 2025-12-04T09:28:36.8262991Z ============================= test session starts ============================== 2025-12-04T09:28:36.8263649Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.8264247Z cachedir: .pytest_cache 2025-12-04T09:28:36.8264935Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.8265817Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.8266161Z configfile: pytest.ini 2025-12-04T09:28:36.8266837Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.8268662Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.8270087Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.8271490Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.8272909Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.8273310Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:36.8274744Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8275976Z Running 1 items in this shard 2025-12-04T09:28:36.8276162Z 2025-12-04T09:28:36.8277663Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:18:38.479000 34639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 34691 2025-12-04T09:28:36.8280104Z I1204 09:18:38.480000 34639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 34692 2025-12-04T09:28:36.8281232Z I1204 09:18:38.481000 34639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 34693 2025-12-04T09:28:36.8282373Z I1204 09:18:38.482000 34639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 34694 2025-12-04T09:28:36.8285412Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8288028Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8290622Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8293448Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8296180Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8298804Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8301391Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8304019Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8308572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8313022Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8317462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8321866Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8326352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8330760Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8335626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8340587Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8341628Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8342705Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8344327Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8346011Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8347423Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8348728Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8350016Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8351388Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8352744Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8354105Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8355465Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8356790Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8358164Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8359514Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8361757Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 481230848 and is now 617545728. 2025-12-04T09:28:36.8363877Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8364870Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8366884Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8368618Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8369644Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8370888Z E1204 09:18:45.783000 34694 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.8371855Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8372805Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8374537Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8376124Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8377715Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8379364Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8380809Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8382340Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8383875Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8385401Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8386941Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8388514Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8390017Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8391549Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8393808Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:36.8395939Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8396914Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8398932Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8400672Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8401786Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8402991Z E1204 09:18:45.788000 34691 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.8403950Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8404906Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8406344Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8407759Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8409171Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8410468Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8411751Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8413119Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8414839Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8416364Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8417962Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8419451Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8420953Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8422489Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8425019Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.8427447Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8428437Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8430449Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8432245Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8433274Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8434474Z E1204 09:18:45.789000 34692 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.8435442Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8436392Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8437826Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8439226Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8440637Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8441936Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8443215Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8444572Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8445935Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8447355Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8448723Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8450037Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8451355Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8452720Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8455396Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.8457783Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8458889Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8461138Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8464429Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8465711Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8467043Z E1204 09:18:45.791000 34693 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.8467724Z FAILED [9.2036s] [100%] 2025-12-04T09:28:36.8467885Z 2025-12-04T09:28:36.8468019Z =================================== FAILURES =================================== 2025-12-04T09:28:36.8468835Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:36.8469623Z Traceback (most recent call last): 2025-12-04T09:28:36.8470315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.8471029Z self._join_processes(fn) 2025-12-04T09:28:36.8471743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.8472521Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.8473290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.8474056Z raise RuntimeError(error) 2025-12-04T09:28:36.8474463Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:36.8474905Z Traceback (most recent call last): 2025-12-04T09:28:36.8475589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8476301Z getattr(self, test_name)() 2025-12-04T09:28:36.8477023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8477699Z fn() 2025-12-04T09:28:36.8478274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8479274Z method(*args, **kwargs) 2025-12-04T09:28:36.8480176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8480921Z method(*args, **kwargs) 2025-12-04T09:28:36.8481633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8482392Z with policy(): 2025-12-04T09:28:36.8483057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8483826Z raise RuntimeError(msg) 2025-12-04T09:28:36.8485545Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 481230848 and is now 617545728. 2025-12-04T09:28:36.8487174Z 2025-12-04T09:28:36.8487406Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8488829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8490017Z 2025-12-04T09:28:36.8490285Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8490801Z 2025-12-04T09:28:36.8490806Z 2025-12-04T09:28:36.8491029Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.8491868Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.8493131Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c0871d667bd4df8d.xml - 2025-12-04T09:28:36.8494576Z =========================== short test summary info ============================ 2025-12-04T09:28:36.8496104Z FAILED [9.2036s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:36.8497568Z Traceback (most recent call last): 2025-12-04T09:28:36.8498365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8499160Z getattr(self, test_name)() 2025-12-04T09:28:36.8499922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8500693Z fn() 2025-12-04T09:28:36.8501341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8502084Z method(*args, **kwargs) 2025-12-04T09:28:36.8502796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8503551Z method(*args, **kwargs) 2025-12-04T09:28:36.8504247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8504994Z with policy(): 2025-12-04T09:28:36.8505759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8506438Z raise RuntimeError(msg) 2025-12-04T09:28:36.8508014Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 481230848 and is now 617545728. 2025-12-04T09:28:36.8509471Z 2025-12-04T09:28:36.8509667Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8510937Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8512005Z 2025-12-04T09:28:36.8512259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8512793Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.8513233Z ======================= 1 failed, 14 deselected in 9.42s ======================= 2025-12-04T09:28:36.8513606Z Got exit code 1 2025-12-04T09:28:36.8513846Z Retrying single test... 2025-12-04T09:28:36.8514680Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b1a99f4c33297699.xml 2025-12-04T09:28:36.8515619Z ============================= test session starts ============================== 2025-12-04T09:28:36.8516213Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.8516751Z cachedir: .pytest_cache 2025-12-04T09:28:36.8517376Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.8518134Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.8518451Z configfile: pytest.ini 2025-12-04T09:28:36.8519091Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.8520809Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.8522170Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.8523501Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.8524857Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.8525247Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:36.8526522Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8527757Z Running 1 items in this shard 2025-12-04T09:28:36.8527947Z 2025-12-04T09:28:36.8529214Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:18:52.340000 34972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 35024 2025-12-04T09:28:36.8531010Z I1204 09:18:52.341000 34972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 35025 2025-12-04T09:28:36.8532025Z I1204 09:18:52.341000 34972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 35026 2025-12-04T09:28:36.8533042Z I1204 09:18:52.342000 34972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 35027 2025-12-04T09:28:36.8536331Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8538951Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8541533Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8544149Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8546837Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8549161Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8551454Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8553818Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8557971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8562369Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8566854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8571272Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8576209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8581376Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8586395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8591467Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8592331Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8593294Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8594731Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8596136Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8597548Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8598863Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8600204Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8601566Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8602909Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8604269Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8605626Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8606973Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8608294Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8609671Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8611917Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:36.8614383Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8615501Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8617771Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8619717Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8620895Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8622258Z E1204 09:18:59.482000 35026 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.8623350Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8624410Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8626131Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8627629Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8629121Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8630567Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8631917Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8633351Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8634803Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8636270Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8637624Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8638947Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8640277Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8641636Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8643872Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 632225792 and is now 720306176. 2025-12-04T09:28:36.8646034Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8647011Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8649022Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8650760Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8651805Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8653009Z E1204 09:18:59.484000 35024 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.8654242Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8655312Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8656946Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8658535Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8660177Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8661640Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8663087Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8664618Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8666232Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8667580Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8668950Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8670265Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8671595Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8672950Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8675267Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.8677385Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8678370Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8680869Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8682835Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8683999Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8685347Z E1204 09:18:59.485000 35027 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.8686433Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8687517Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8689122Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8690807Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8692381Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8693949Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8695455Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8696977Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8698529Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8700058Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8701590Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8703075Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8704577Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8706267Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8708518Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.8710637Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8711622Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8713609Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8715354Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8716388Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8717584Z E1204 09:18:59.485000 35025 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.8718446Z FAILED [9.2708s] [100%] 2025-12-04T09:28:36.8718626Z 2025-12-04T09:28:36.8718768Z =================================== FAILURES =================================== 2025-12-04T09:28:36.8719628Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:36.8720462Z Traceback (most recent call last): 2025-12-04T09:28:36.8721244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.8721993Z self._join_processes(fn) 2025-12-04T09:28:36.8722747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.8723569Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.8724391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.8725208Z raise RuntimeError(error) 2025-12-04T09:28:36.8725637Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:36.8726094Z Traceback (most recent call last): 2025-12-04T09:28:36.8726834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8727770Z getattr(self, test_name)() 2025-12-04T09:28:36.8728504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8729241Z fn() 2025-12-04T09:28:36.8729972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8730686Z method(*args, **kwargs) 2025-12-04T09:28:36.8731341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8732053Z method(*args, **kwargs) 2025-12-04T09:28:36.8732718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8733552Z with policy(): 2025-12-04T09:28:36.8734390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8735161Z raise RuntimeError(msg) 2025-12-04T09:28:36.8736881Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 632225792 and is now 720306176. 2025-12-04T09:28:36.8738511Z 2025-12-04T09:28:36.8738745Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8740160Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8741374Z 2025-12-04T09:28:36.8741644Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8742057Z 2025-12-04T09:28:36.8742224Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.8742654Z Traceback (most recent call last): 2025-12-04T09:28:36.8743434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8744233Z getattr(self, test_name)() 2025-12-04T09:28:36.8744986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8745858Z fn() 2025-12-04T09:28:36.8746539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8747213Z method(*args, **kwargs) 2025-12-04T09:28:36.8747845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8748689Z method(*args, **kwargs) 2025-12-04T09:28:36.8749357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8750123Z with policy(): 2025-12-04T09:28:36.8750766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8751477Z raise RuntimeError(msg) 2025-12-04T09:28:36.8753075Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:36.8754599Z 2025-12-04T09:28:36.8754796Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8756118Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8757242Z 2025-12-04T09:28:36.8757494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8757867Z 2025-12-04T09:28:36.8757871Z 2025-12-04T09:28:36.8758080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.8758654Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.8759897Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b1a99f4c33297699.xml - 2025-12-04T09:28:36.8761113Z =========================== short test summary info ============================ 2025-12-04T09:28:36.8762509Z FAILED [9.2708s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:36.8763797Z Traceback (most recent call last): 2025-12-04T09:28:36.8764497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8765190Z getattr(self, test_name)() 2025-12-04T09:28:36.8765845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8766515Z fn() 2025-12-04T09:28:36.8767075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8767728Z method(*args, **kwargs) 2025-12-04T09:28:36.8768347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8769004Z method(*args, **kwargs) 2025-12-04T09:28:36.8769616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8770265Z with policy(): 2025-12-04T09:28:36.8770855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8771511Z raise RuntimeError(msg) 2025-12-04T09:28:36.8773021Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 632225792 and is now 720306176. 2025-12-04T09:28:36.8774805Z 2025-12-04T09:28:36.8775013Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8776501Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8777712Z 2025-12-04T09:28:36.8777979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8778383Z 2025-12-04T09:28:36.8778562Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.8779180Z Traceback (most recent call last): 2025-12-04T09:28:36.8779976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8780778Z getattr(self, test_name)() 2025-12-04T09:28:36.8781532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8782295Z fn() 2025-12-04T09:28:36.8782953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8783714Z method(*args, **kwargs) 2025-12-04T09:28:36.8784421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8785180Z method(*args, **kwargs) 2025-12-04T09:28:36.8785889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8786646Z with policy(): 2025-12-04T09:28:36.8787311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8788077Z raise RuntimeError(msg) 2025-12-04T09:28:36.8789784Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:36.8791529Z 2025-12-04T09:28:36.8791743Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8793006Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8794074Z 2025-12-04T09:28:36.8794312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8794837Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.8795280Z ======================= 1 failed, 14 deselected in 9.48s ======================= 2025-12-04T09:28:36.8795641Z Got exit code 1 2025-12-04T09:28:36.8796672Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.8798043Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:36.8799205Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7332aead750b9bce.xml 2025-12-04T09:28:36.8800146Z ============================= test session starts ============================== 2025-12-04T09:28:36.8800720Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.8801249Z cachedir: .pytest_cache 2025-12-04T09:28:36.8801877Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.8802556Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.8802876Z configfile: pytest.ini 2025-12-04T09:28:36.8803525Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.8805305Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.8806645Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.8807973Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.8809318Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.8809714Z collected 15 items / 1 deselected / 14 selected 2025-12-04T09:28:36.8810085Z stepcurrent: skipping 1 already run items. 2025-12-04T09:28:36.8810429Z Running 14 items in this shard 2025-12-04T09:28:36.8810617Z 2025-12-04T09:28:36.8811877Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:19:06.180000 35305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 35357 2025-12-04T09:28:36.8813945Z I1204 09:19:06.181000 35305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 35358 2025-12-04T09:28:36.8815066Z I1204 09:19:06.181000 35305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 35359 2025-12-04T09:28:36.8816197Z I1204 09:19:06.182000 35305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 35360 2025-12-04T09:28:36.8819248Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8821937Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8824534Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8827112Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8829397Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8831712Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8834008Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.8836546Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.8841016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8845864Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8850738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8855917Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8860921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8866079Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8870855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.8875784Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.8876711Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8877758Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8879670Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8881265Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8882842Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8884310Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8885753Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8887375Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8888914Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8890431Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8892108Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8893658Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8895324Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8896851Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8899387Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 726597632. 2025-12-04T09:28:36.8901770Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8902888Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8905241Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.8907307Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8908337Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8909543Z E1204 09:19:13.320000 35357 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.8910521Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8911479Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8912906Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8914313Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8915723Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8917023Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8918367Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8919717Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8921080Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8922439Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8923803Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8925131Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8926452Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8927829Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8930073Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.8932193Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8933293Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8935676Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.8937631Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8938805Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8940171Z E1204 09:19:13.323000 35358 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.8941267Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8942333Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8943958Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8945659Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8947206Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8948559Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8949854Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8951221Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8952582Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8953940Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8955498Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8956905Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8958308Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8959754Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8962135Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.8964418Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8965463Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.8967586Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.8969473Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8970520Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.8971706Z E1204 09:19:13.324000 35359 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.8972670Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.8973854Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.8975514Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.8977089Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.8978905Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.8980385Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.8981845Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8983384Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8984905Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.8986435Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.8987968Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.8989456Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.8991042Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.8992487Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.8994947Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.8997196Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.8998247Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9000380Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9002285Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9003330Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9004529Z E1204 09:19:13.325000 35360 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.9005211Z FAILED [9.0106s] [ 7%] 2025-12-04T09:28:36.9005372Z 2025-12-04T09:28:36.9005506Z =================================== FAILURES =================================== 2025-12-04T09:28:36.9006313Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:36.9007093Z Traceback (most recent call last): 2025-12-04T09:28:36.9007795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.9008569Z self._join_processes(fn) 2025-12-04T09:28:36.9009284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.9010057Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.9010842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.9011595Z raise RuntimeError(error) 2025-12-04T09:28:36.9012006Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:36.9012450Z Traceback (most recent call last): 2025-12-04T09:28:36.9013133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9014107Z getattr(self, test_name)() 2025-12-04T09:28:36.9014868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9015642Z fn() 2025-12-04T09:28:36.9016282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9017043Z method(*args, **kwargs) 2025-12-04T09:28:36.9017753Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9018498Z method(*args, **kwargs) 2025-12-04T09:28:36.9019207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9019953Z with policy(): 2025-12-04T09:28:36.9020634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9021386Z raise RuntimeError(msg) 2025-12-04T09:28:36.9023186Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.9024818Z 2025-12-04T09:28:36.9025039Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9026598Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9027660Z 2025-12-04T09:28:36.9027900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9028270Z 2025-12-04T09:28:36.9028274Z 2025-12-04T09:28:36.9028473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.9029041Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.9030238Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7332aead750b9bce.xml - 2025-12-04T09:28:36.9031341Z =========================== short test summary info ============================ 2025-12-04T09:28:36.9032687Z FAILED [9.0106s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:36.9033975Z Traceback (most recent call last): 2025-12-04T09:28:36.9034676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9035386Z getattr(self, test_name)() 2025-12-04T09:28:36.9036100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9036783Z fn() 2025-12-04T09:28:36.9037364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9038018Z method(*args, **kwargs) 2025-12-04T09:28:36.9038633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9039290Z method(*args, **kwargs) 2025-12-04T09:28:36.9039902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9040551Z with policy(): 2025-12-04T09:28:36.9041137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9041795Z raise RuntimeError(msg) 2025-12-04T09:28:36.9043303Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.9044741Z 2025-12-04T09:28:36.9044932Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9046189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9047252Z 2025-12-04T09:28:36.9047492Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9048009Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.9048438Z ======================= 1 failed, 1 deselected in 9.22s ======================== 2025-12-04T09:28:36.9048796Z Got exit code 1 2025-12-04T09:28:36.9049021Z Retrying single test... 2025-12-04T09:28:36.9049892Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c7d658062419b597.xml 2025-12-04T09:28:36.9050807Z ============================= test session starts ============================== 2025-12-04T09:28:36.9051376Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.9051886Z cachedir: .pytest_cache 2025-12-04T09:28:36.9052485Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.9053151Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.9053525Z configfile: pytest.ini 2025-12-04T09:28:36.9054389Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.9056295Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9057805Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.9059275Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9060764Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.9061175Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:36.9062575Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9064006Z Running 1 items in this shard 2025-12-04T09:28:36.9064208Z 2025-12-04T09:28:36.9065728Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:19:19.900000 35638 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 35690 2025-12-04T09:28:36.9067642Z I1204 09:19:19.900000 35638 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 35691 2025-12-04T09:28:36.9068630Z I1204 09:19:19.901000 35638 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 35692 2025-12-04T09:28:36.9069614Z I1204 09:19:19.902000 35638 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 35693 2025-12-04T09:28:36.9072304Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9074613Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9076890Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9079539Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9082198Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9084807Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9087374Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9089962Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9094722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9099753Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9104724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9109368Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9113838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9118399Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9123086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9127737Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9128621Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9129683Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9131282Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9132673Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9134333Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9135781Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9137208Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9138736Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9140246Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9141768Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9143276Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9144745Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9146451Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9147798Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9150020Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 393150464 and is now 615448576. 2025-12-04T09:28:36.9152133Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9153108Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9155109Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9156827Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9157849Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9159033Z E1204 09:19:27.200000 35693 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.9160047Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9160981Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9162391Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9163789Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9165175Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9166459Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9167725Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9169084Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9170426Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9171764Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9173107Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9174816Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9176293Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9177816Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9201299Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:28:36.9203789Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9204865Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9207043Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9208919Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9210025Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9211458Z E1204 09:19:27.201000 35691 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.9212491Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9213614Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9215376Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9216946Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9218510Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9219964Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9221389Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9222898Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9224414Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9226112Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9227634Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9229005Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9230389Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9231862Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9234090Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 720306176. 2025-12-04T09:28:36.9236189Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9237149Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9239128Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9240842Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9241918Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9243304Z E1204 09:19:27.202000 35690 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.9244295Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9245288Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9246788Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9248265Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9249739Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9251098Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9252439Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9254118Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9255627Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9257126Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9259312Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9260785Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9262267Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9263780Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9266349Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.9268431Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9269401Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9271385Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9273153Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9274178Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9275374Z E1204 09:19:27.205000 35692 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.9276044Z FAILED [9.2296s] [100%] 2025-12-04T09:28:36.9276203Z 2025-12-04T09:28:36.9276346Z =================================== FAILURES =================================== 2025-12-04T09:28:36.9277135Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:36.9277905Z Traceback (most recent call last): 2025-12-04T09:28:36.9278730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.9279685Z self._join_processes(fn) 2025-12-04T09:28:36.9280480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.9281344Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.9282212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.9283064Z raise RuntimeError(error) 2025-12-04T09:28:36.9283510Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:36.9283988Z Traceback (most recent call last): 2025-12-04T09:28:36.9284764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9285555Z getattr(self, test_name)() 2025-12-04T09:28:36.9286296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9287041Z fn() 2025-12-04T09:28:36.9287770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9288509Z method(*args, **kwargs) 2025-12-04T09:28:36.9289198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9289935Z method(*args, **kwargs) 2025-12-04T09:28:36.9290630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9291367Z with policy(): 2025-12-04T09:28:36.9292079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9292752Z raise RuntimeError(msg) 2025-12-04T09:28:36.9294602Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 393150464 and is now 615448576. 2025-12-04T09:28:36.9296225Z 2025-12-04T09:28:36.9296454Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9297858Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9299057Z 2025-12-04T09:28:36.9299322Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9299734Z 2025-12-04T09:28:36.9299738Z 2025-12-04T09:28:36.9300045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.9300665Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.9301995Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c7d658062419b597.xml - 2025-12-04T09:28:36.9303221Z =========================== short test summary info ============================ 2025-12-04T09:28:36.9304745Z FAILED [9.2296s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:36.9306259Z Traceback (most recent call last): 2025-12-04T09:28:36.9306955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9307653Z getattr(self, test_name)() 2025-12-04T09:28:36.9308313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9308993Z fn() 2025-12-04T09:28:36.9309564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9310231Z method(*args, **kwargs) 2025-12-04T09:28:36.9310853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9311515Z method(*args, **kwargs) 2025-12-04T09:28:36.9312128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9312780Z with policy(): 2025-12-04T09:28:36.9313372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9314048Z raise RuntimeError(msg) 2025-12-04T09:28:36.9315598Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 393150464 and is now 615448576. 2025-12-04T09:28:36.9317040Z 2025-12-04T09:28:36.9317231Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9318478Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9319532Z 2025-12-04T09:28:36.9319771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9320284Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.9320714Z ======================= 1 failed, 14 deselected in 9.44s ======================= 2025-12-04T09:28:36.9321079Z Got exit code 1 2025-12-04T09:28:36.9321312Z Retrying single test... 2025-12-04T09:28:36.9322138Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-348cc3a828a50222.xml 2025-12-04T09:28:36.9323069Z ============================= test session starts ============================== 2025-12-04T09:28:36.9323645Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.9324169Z cachedir: .pytest_cache 2025-12-04T09:28:36.9324780Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.9325464Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.9325767Z configfile: pytest.ini 2025-12-04T09:28:36.9326445Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.9328146Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9329481Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.9330796Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9332138Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.9332516Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:36.9334037Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9335417Z Running 1 items in this shard 2025-12-04T09:28:36.9335626Z 2025-12-04T09:28:36.9337039Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:19:33.770000 35971 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36023 2025-12-04T09:28:36.9339056Z I1204 09:19:33.771000 35971 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36024 2025-12-04T09:28:36.9340185Z I1204 09:19:33.772000 35971 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36025 2025-12-04T09:28:36.9341307Z I1204 09:19:33.772000 35971 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36026 2025-12-04T09:28:36.9344422Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9347196Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9349607Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9352064Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9354347Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9356655Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9358937Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9361297Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9365680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9370346Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9375427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9380584Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9385590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9390673Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9395213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9399670Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9400514Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9401469Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9402900Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9404292Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9405688Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9406975Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9408252Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9409671Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9411019Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9412361Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9413959Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9415438Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9416920Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9418450Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9420961Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:36.9423395Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9424497Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9426856Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9428566Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9429595Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9430783Z E1204 09:19:41.041000 36023 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.9431742Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9432676Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9434105Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9435504Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9436902Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9438201Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9439516Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9440869Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9442222Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9443574Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9444917Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9446233Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9447555Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9448907Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9451135Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9453368Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9454602Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9456846Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9458786Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9459946Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9461277Z E1204 09:19:41.041000 36024 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.9462354Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9463417Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9465022Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9466727Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9468121Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9469470Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9470737Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9472088Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9473427Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9474782Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9476134Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9477442Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9478892Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9480568Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9483090Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 611254272. 2025-12-04T09:28:36.9485552Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9486654Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9488899Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9490853Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9492040Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9493282Z E1204 09:19:41.042000 36025 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.9494474Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9495533Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9497139Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9498711Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9500359Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9501819Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9503254Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9504768Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9506455Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9507807Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9509155Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9510469Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9511791Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9513153Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9515417Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.9517517Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9518497Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9520491Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9522223Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9523242Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9524434Z E1204 09:19:41.043000 36026 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.9525098Z FAILED [9.1022s] [100%] 2025-12-04T09:28:36.9525259Z 2025-12-04T09:28:36.9525404Z =================================== FAILURES =================================== 2025-12-04T09:28:36.9526184Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:36.9526303Z Traceback (most recent call last): 2025-12-04T09:28:36.9526785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.9526939Z self._join_processes(fn) 2025-12-04T09:28:36.9527458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.9527587Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.9528124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.9528225Z raise RuntimeError(error) 2025-12-04T09:28:36.9528430Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:36.9528544Z Traceback (most recent call last): 2025-12-04T09:28:36.9529020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9529130Z getattr(self, test_name)() 2025-12-04T09:28:36.9529606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9529690Z fn() 2025-12-04T09:28:36.9530142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9530235Z method(*args, **kwargs) 2025-12-04T09:28:36.9530681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9530781Z method(*args, **kwargs) 2025-12-04T09:28:36.9531226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9531320Z with policy(): 2025-12-04T09:28:36.9531768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9531916Z raise RuntimeError(msg) 2025-12-04T09:28:36.9533305Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9533312Z 2025-12-04T09:28:36.9533508Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9534727Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9534732Z 2025-12-04T09:28:36.9535000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9535011Z 2025-12-04T09:28:36.9535016Z 2025-12-04T09:28:36.9535245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.9535508Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.9536449Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-348cc3a828a50222.xml - 2025-12-04T09:28:36.9536624Z =========================== short test summary info ============================ 2025-12-04T09:28:36.9537830Z FAILED [9.1022s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:36.9537959Z Traceback (most recent call last): 2025-12-04T09:28:36.9538506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9538622Z getattr(self, test_name)() 2025-12-04T09:28:36.9539224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9539310Z fn() 2025-12-04T09:28:36.9539821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9539927Z method(*args, **kwargs) 2025-12-04T09:28:36.9540428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9540540Z method(*args, **kwargs) 2025-12-04T09:28:36.9541040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9541142Z with policy(): 2025-12-04T09:28:36.9541652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9541764Z raise RuntimeError(msg) 2025-12-04T09:28:36.9543266Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9543273Z 2025-12-04T09:28:36.9543487Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9544555Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9544561Z 2025-12-04T09:28:36.9544825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9545065Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.9545244Z ======================= 1 failed, 14 deselected in 9.32s ======================= 2025-12-04T09:28:36.9545345Z Got exit code 1 2025-12-04T09:28:36.9546480Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9546844Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:36.9547511Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bb573131fa19ab29.xml 2025-12-04T09:28:36.9547664Z ============================= test session starts ============================== 2025-12-04T09:28:36.9547976Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.9548083Z cachedir: .pytest_cache 2025-12-04T09:28:36.9548540Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.9548649Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.9548754Z configfile: pytest.ini 2025-12-04T09:28:36.9549228Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.9550345Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9550469Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.9551552Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9551765Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.9551897Z collected 15 items / 2 deselected / 13 selected 2025-12-04T09:28:36.9552022Z stepcurrent: skipping 2 already run items. 2025-12-04T09:28:36.9552129Z Running 13 items in this shard 2025-12-04T09:28:36.9552134Z 2025-12-04T09:28:36.9553384Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:19:47.650000 36304 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36356 2025-12-04T09:28:36.9553832Z I1204 09:19:47.651000 36304 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36357 2025-12-04T09:28:36.9554269Z I1204 09:19:47.652000 36304 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36358 2025-12-04T09:28:36.9554714Z I1204 09:19:47.652000 36304 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36359 2025-12-04T09:28:36.9556845Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9556947Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9559070Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9559218Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9561326Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9561427Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9563547Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9563642Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9567691Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9568051Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9572018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9572374Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9576914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9577422Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9582110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9582502Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9583033Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9583541Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9584514Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9585004Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9585963Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9586345Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9587287Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9587752Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9588678Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9589132Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9590146Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9590647Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9591484Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9591888Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9593593Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 168755200 and is now 617545728. 2025-12-04T09:28:36.9593898Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9594456Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9595780Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9596076Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9596703Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9597220Z E1204 09:19:54.944000 36359 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.9597607Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9598053Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9598905Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9599334Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9600186Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9600522Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9601350Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9601753Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9602581Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9603039Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9603872Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9604244Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9605074Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9605482Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9607187Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:36.9607494Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9608056Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9609386Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9609683Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9610353Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9610811Z E1204 09:19:54.946000 36356 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.9611182Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9611634Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9612491Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9612922Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9614037Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9614416Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9615339Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9615798Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9616740Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9617273Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9618208Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9618623Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9619562Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9620022Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9621943Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:36.9622282Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9622910Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9624400Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9624795Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9625492Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9626193Z E1204 09:19:54.947000 36358 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.9626568Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9627018Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9627873Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9628313Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9629157Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9629497Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9630313Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9630767Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9631602Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9632003Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9632837Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9633202Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9634035Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9634447Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9636158Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 523173888 and is now 611254272. 2025-12-04T09:28:36.9636459Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9637013Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9638389Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9638684Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9639301Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9639756Z E1204 09:19:54.949000 36357 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.9639846Z FAILED [9.2324s] [ 7%] 2025-12-04T09:28:36.9639851Z 2025-12-04T09:28:36.9639990Z =================================== FAILURES =================================== 2025-12-04T09:28:36.9640537Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:36.9640660Z Traceback (most recent call last): 2025-12-04T09:28:36.9641146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.9641247Z self._join_processes(fn) 2025-12-04T09:28:36.9641775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.9641902Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.9642446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.9642545Z raise RuntimeError(error) 2025-12-04T09:28:36.9642754Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.9642923Z Traceback (most recent call last): 2025-12-04T09:28:36.9643400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9643500Z getattr(self, test_name)() 2025-12-04T09:28:36.9643979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9644060Z fn() 2025-12-04T09:28:36.9644513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9644605Z method(*args, **kwargs) 2025-12-04T09:28:36.9645052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9645151Z method(*args, **kwargs) 2025-12-04T09:28:36.9645594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9645682Z with policy(): 2025-12-04T09:28:36.9646136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9646234Z raise RuntimeError(msg) 2025-12-04T09:28:36.9647569Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:36.9647575Z 2025-12-04T09:28:36.9647765Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9648714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9648723Z 2025-12-04T09:28:36.9648959Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9648963Z 2025-12-04T09:28:36.9648968Z 2025-12-04T09:28:36.9649226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.9649466Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.9650290Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bb573131fa19ab29.xml - 2025-12-04T09:28:36.9650447Z =========================== short test summary info ============================ 2025-12-04T09:28:36.9651524Z FAILED [9.2324s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.9651638Z Traceback (most recent call last): 2025-12-04T09:28:36.9652129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9652229Z getattr(self, test_name)() 2025-12-04T09:28:36.9652705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9652781Z fn() 2025-12-04T09:28:36.9653293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9653397Z method(*args, **kwargs) 2025-12-04T09:28:36.9654037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9654143Z method(*args, **kwargs) 2025-12-04T09:28:36.9654652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9654812Z with policy(): 2025-12-04T09:28:36.9655326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9655433Z raise RuntimeError(msg) 2025-12-04T09:28:36.9656919Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:36.9656933Z 2025-12-04T09:28:36.9657149Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9658206Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9658216Z 2025-12-04T09:28:36.9658495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9658676Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.9658862Z ======================= 1 failed, 2 deselected in 9.45s ======================== 2025-12-04T09:28:36.9658959Z Got exit code 1 2025-12-04T09:28:36.9659062Z Retrying single test... 2025-12-04T09:28:36.9659828Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-16546bb6943a3c11.xml 2025-12-04T09:28:36.9659988Z ============================= test session starts ============================== 2025-12-04T09:28:36.9660332Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.9660449Z cachedir: .pytest_cache 2025-12-04T09:28:36.9660967Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.9661094Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.9661255Z configfile: pytest.ini 2025-12-04T09:28:36.9661791Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.9663052Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9663186Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.9664409Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9664569Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.9664715Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:36.9665965Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9666067Z Running 1 items in this shard 2025-12-04T09:28:36.9666072Z 2025-12-04T09:28:36.9667324Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:20:01.489000 36637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36689 2025-12-04T09:28:36.9667763Z I1204 09:20:01.490000 36637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36690 2025-12-04T09:28:36.9668249Z I1204 09:20:01.491000 36637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36691 2025-12-04T09:28:36.9668692Z I1204 09:20:01.492000 36637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36692 2025-12-04T09:28:36.9670815Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9670923Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9673035Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9673147Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9675242Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9675355Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9677496Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9677604Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9682214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9682631Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9687108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9687597Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9692085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9692431Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9697051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9697451Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9697882Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9698391Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9699371Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9699847Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9700808Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9701248Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9702179Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9702642Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9703567Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9704032Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9704967Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9705387Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9706404Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9706851Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9708781Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:36.9709110Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9709725Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9711160Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9711490Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9712158Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9712657Z E1204 09:20:08.760000 36690 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.9713072Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9713555Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9714499Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9714961Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9716054Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9716396Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9717263Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9717695Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9718563Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9719000Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9719873Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9720271Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9721325Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9721770Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9723700Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:36.9724026Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9724643Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9726081Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9726419Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9727085Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9727583Z E1204 09:20:08.760000 36692 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.9727994Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9728477Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9729423Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9729936Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9730876Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9731232Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9732128Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9732575Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9733537Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9734176Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9735105Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9735534Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9736462Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9736925Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9738913Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9739246Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9739882Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9741362Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9741703Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9742386Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9742900Z E1204 09:20:08.763000 36691 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.9743329Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9743827Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9744856Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9745334Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9746483Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9746828Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9747701Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9748145Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9749075Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9749483Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9750306Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9750671Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9751505Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9751966Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9753677Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 632225792 and is now 720306176. 2025-12-04T09:28:36.9753974Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9754537Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9755847Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9756146Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9756745Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9757204Z E1204 09:20:08.766000 36689 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.9757302Z FAILED [9.3666s] [100%] 2025-12-04T09:28:36.9757373Z 2025-12-04T09:28:36.9757506Z =================================== FAILURES =================================== 2025-12-04T09:28:36.9758060Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:36.9758169Z Traceback (most recent call last): 2025-12-04T09:28:36.9758841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.9758956Z self._join_processes(fn) 2025-12-04T09:28:36.9759501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.9759639Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.9760218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.9760323Z raise RuntimeError(error) 2025-12-04T09:28:36.9760559Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:36.9760673Z Traceback (most recent call last): 2025-12-04T09:28:36.9761182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9761297Z getattr(self, test_name)() 2025-12-04T09:28:36.9761794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9761886Z fn() 2025-12-04T09:28:36.9762356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9762451Z method(*args, **kwargs) 2025-12-04T09:28:36.9762933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9763025Z method(*args, **kwargs) 2025-12-04T09:28:36.9763498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9763598Z with policy(): 2025-12-04T09:28:36.9764560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9764675Z raise RuntimeError(msg) 2025-12-04T09:28:36.9766076Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:36.9766082Z 2025-12-04T09:28:36.9766284Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9767297Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9767307Z 2025-12-04T09:28:36.9767554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9767563Z 2025-12-04T09:28:36.9767724Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.9767837Z Traceback (most recent call last): 2025-12-04T09:28:36.9768343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9768458Z getattr(self, test_name)() 2025-12-04T09:28:36.9768956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9769053Z fn() 2025-12-04T09:28:36.9769625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9769771Z method(*args, **kwargs) 2025-12-04T09:28:36.9770220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9770309Z method(*args, **kwargs) 2025-12-04T09:28:36.9770765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9770850Z with policy(): 2025-12-04T09:28:36.9771299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9771404Z raise RuntimeError(msg) 2025-12-04T09:28:36.9772724Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9772733Z 2025-12-04T09:28:36.9772928Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9774150Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9774157Z 2025-12-04T09:28:36.9774420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9774425Z 2025-12-04T09:28:36.9774439Z 2025-12-04T09:28:36.9774657Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.9774917Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.9775856Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-16546bb6943a3c11.xml - 2025-12-04T09:28:36.9776027Z =========================== short test summary info ============================ 2025-12-04T09:28:36.9777298Z FAILED [9.3666s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:36.9777420Z Traceback (most recent call last): 2025-12-04T09:28:36.9777963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9778082Z getattr(self, test_name)() 2025-12-04T09:28:36.9778793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9778884Z fn() 2025-12-04T09:28:36.9779400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9779508Z method(*args, **kwargs) 2025-12-04T09:28:36.9780019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9780124Z method(*args, **kwargs) 2025-12-04T09:28:36.9780618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9780722Z with policy(): 2025-12-04T09:28:36.9781227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9781331Z raise RuntimeError(msg) 2025-12-04T09:28:36.9782838Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:36.9782940Z 2025-12-04T09:28:36.9783154Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9784230Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9784236Z 2025-12-04T09:28:36.9784498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9784503Z 2025-12-04T09:28:36.9784673Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.9784795Z Traceback (most recent call last): 2025-12-04T09:28:36.9785342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9785461Z getattr(self, test_name)() 2025-12-04T09:28:36.9785991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9786092Z fn() 2025-12-04T09:28:36.9786607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9786708Z method(*args, **kwargs) 2025-12-04T09:28:36.9787220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9787324Z method(*args, **kwargs) 2025-12-04T09:28:36.9787823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9787932Z with policy(): 2025-12-04T09:28:36.9788436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9788559Z raise RuntimeError(msg) 2025-12-04T09:28:36.9790129Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9790139Z 2025-12-04T09:28:36.9790356Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9791439Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9791445Z 2025-12-04T09:28:36.9791677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9791845Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.9792000Z ======================= 1 failed, 14 deselected in 9.58s ======================= 2025-12-04T09:28:36.9792089Z Got exit code 1 2025-12-04T09:28:36.9792196Z Retrying single test... 2025-12-04T09:28:36.9792868Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-cdd2e74ccc0956b9.xml 2025-12-04T09:28:36.9793023Z ============================= test session starts ============================== 2025-12-04T09:28:36.9793333Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.9793425Z cachedir: .pytest_cache 2025-12-04T09:28:36.9793892Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.9793996Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.9794099Z configfile: pytest.ini 2025-12-04T09:28:36.9794571Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.9795795Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9795926Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.9797005Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9797151Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.9797280Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:36.9798284Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9798396Z Running 1 items in this shard 2025-12-04T09:28:36.9798401Z 2025-12-04T09:28:36.9799642Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:20:15.320000 36970 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 37022 2025-12-04T09:28:36.9800088Z I1204 09:20:15.321000 36970 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 37023 2025-12-04T09:28:36.9800525Z I1204 09:20:15.321000 36970 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 37024 2025-12-04T09:28:36.9800952Z I1204 09:20:15.322000 36970 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 37025 2025-12-04T09:28:36.9803152Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9803256Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9805383Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9805485Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9807596Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9807693Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9809796Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9809945Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9814224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9814622Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9819150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9819553Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9824041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9824439Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9828928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9829355Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9829782Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9830275Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9831304Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9831766Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9832668Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9833020Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9833896Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9834478Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9835310Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9835712Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9836543Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9836911Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9837745Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9838151Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9839860Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 726597632. 2025-12-04T09:28:36.9840159Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9840764Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9842095Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9842393Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9843011Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9843466Z E1204 09:20:22.660000 37022 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.9843850Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9844296Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9845152Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9845581Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9846423Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9846755Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9847629Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9848032Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9848863Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9849267Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9850093Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9850463Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9851306Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9851711Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9853458Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.9854013Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9854648Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9856140Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9856474Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9857171Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9857691Z E1204 09:20:22.665000 37024 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.9858113Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9858619Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9859582Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9860069Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9861024Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9861405Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9862409Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9862864Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9863798Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9864256Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9865197Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9865613Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9866587Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9866998Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9868704Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9869065Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9869623Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9870948Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9871242Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9871861Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9872320Z E1204 09:20:22.665000 37023 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.9872694Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9873148Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9874008Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9874434Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9875331Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9875664Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9876485Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9876886Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9877707Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9878111Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9879254Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9879676Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9880621Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9881081Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9882995Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 397344768 and is now 611254272. 2025-12-04T09:28:36.9883428Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9884053Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9885539Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9885873Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9886569Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9887083Z E1204 09:20:22.667000 37025 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:36.9887181Z FAILED [9.1900s] [100%] 2025-12-04T09:28:36.9887187Z 2025-12-04T09:28:36.9887340Z =================================== FAILURES =================================== 2025-12-04T09:28:36.9887948Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:36.9888078Z Traceback (most recent call last): 2025-12-04T09:28:36.9888619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:36.9888734Z self._join_processes(fn) 2025-12-04T09:28:36.9889397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:36.9889542Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:36.9890141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:36.9890268Z raise RuntimeError(error) 2025-12-04T09:28:36.9890503Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.9890634Z Traceback (most recent call last): 2025-12-04T09:28:36.9891174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9891287Z getattr(self, test_name)() 2025-12-04T09:28:36.9891890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9891970Z fn() 2025-12-04T09:28:36.9892433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9892528Z method(*args, **kwargs) 2025-12-04T09:28:36.9892969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9893074Z method(*args, **kwargs) 2025-12-04T09:28:36.9893762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9893861Z with policy(): 2025-12-04T09:28:36.9894384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9894569Z raise RuntimeError(msg) 2025-12-04T09:28:36.9896136Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.9896142Z 2025-12-04T09:28:36.9896357Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9897419Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9897436Z 2025-12-04T09:28:36.9897698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9897704Z 2025-12-04T09:28:36.9897708Z 2025-12-04T09:28:36.9897924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:36.9898206Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:36.9899146Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-cdd2e74ccc0956b9.xml - 2025-12-04T09:28:36.9899324Z =========================== short test summary info ============================ 2025-12-04T09:28:36.9900529Z FAILED [9.1900s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:36.9900649Z Traceback (most recent call last): 2025-12-04T09:28:36.9901208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9901325Z getattr(self, test_name)() 2025-12-04T09:28:36.9901872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9901962Z fn() 2025-12-04T09:28:36.9902523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9902640Z method(*args, **kwargs) 2025-12-04T09:28:36.9903143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9903245Z method(*args, **kwargs) 2025-12-04T09:28:36.9903754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9903853Z with policy(): 2025-12-04T09:28:36.9904364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9904476Z raise RuntimeError(msg) 2025-12-04T09:28:36.9906070Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:36.9906084Z 2025-12-04T09:28:36.9906274Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9907212Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9907217Z 2025-12-04T09:28:36.9907464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9907619Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:36.9907846Z ======================= 1 failed, 14 deselected in 9.40s ======================= 2025-12-04T09:28:36.9907932Z Got exit code 1 2025-12-04T09:28:36.9908801Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:36.9909175Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:36.9909841Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9650dbe5a6e76fd8.xml 2025-12-04T09:28:36.9910002Z ============================= test session starts ============================== 2025-12-04T09:28:36.9910306Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:36.9910403Z cachedir: .pytest_cache 2025-12-04T09:28:36.9910873Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:36.9910982Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:36.9911082Z configfile: pytest.ini 2025-12-04T09:28:36.9911561Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:36.9912672Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9912797Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:36.9913885Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:36.9914026Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:36.9914169Z collected 15 items / 3 deselected / 12 selected 2025-12-04T09:28:36.9914344Z stepcurrent: skipping 3 already run items. 2025-12-04T09:28:36.9914453Z Running 12 items in this shard 2025-12-04T09:28:36.9914458Z 2025-12-04T09:28:36.9915690Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:20:29.270000 37303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 37355 2025-12-04T09:28:36.9916129Z I1204 09:20:29.271000 37303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 37356 2025-12-04T09:28:36.9916571Z I1204 09:20:29.271000 37303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 37357 2025-12-04T09:28:36.9917007Z I1204 09:20:29.272000 37303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 37358 2025-12-04T09:28:36.9919138Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9919241Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9921351Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9921505Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9923625Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9923721Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9925842Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:36.9925941Z FSDP.set_state_dict_type( 2025-12-04T09:28:36.9929954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9930307Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9934594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9935001Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9939451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9939911Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9944369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:36.9944769Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:36.9945201Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9945875Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9946857Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9947291Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9948142Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9948467Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9949304Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9949708Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9950534Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9950939Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9951768Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9952193Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9953021Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9953437Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9955120Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:36.9955427Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9955989Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9957303Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9957600Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9958207Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9958674Z E1204 09:20:36.581000 37355 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:36.9959097Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9959549Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9960412Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9960843Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9961691Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9962019Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9962852Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9963257Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9964089Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9964495Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9965392Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9965758Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9966589Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9967008Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9968698Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:28:36.9969013Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9969565Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9970888Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9971183Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9971796Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9972316Z E1204 09:20:36.582000 37357 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:36.9972689Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9973146Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9974289Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9974780Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9975747Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9976114Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9977050Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9977507Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9978446Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9979138Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9980069Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9980498Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9981433Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9981902Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9983819Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:36.9984160Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9984789Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:36.9986286Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:36.9986623Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9987413Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:36.9987940Z E1204 09:20:36.583000 37356 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:36.9988361Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:36.9988868Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:36.9989833Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:36.9990312Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:36.9991309Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:36.9991636Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:36.9992468Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9992873Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9993772Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:36.9994178Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:36.9995000Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:36.9995372Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:36.9996194Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:36.9996611Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:36.9998300Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 506396672 and is now 611254272. 2025-12-04T09:28:36.9998605Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:36.9999161Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0000482Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0000834Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0001445Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0001915Z E1204 09:20:36.590000 37358 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0002004Z FAILED [9.4491s] [ 8%] 2025-12-04T09:28:37.0002009Z 2025-12-04T09:28:37.0002150Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0002690Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.0002802Z Traceback (most recent call last): 2025-12-04T09:28:37.0003291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0003390Z self._join_processes(fn) 2025-12-04T09:28:37.0003915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0004043Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0004579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0004693Z raise RuntimeError(error) 2025-12-04T09:28:37.0004900Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.0005006Z Traceback (most recent call last): 2025-12-04T09:28:37.0005491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0005643Z getattr(self, test_name)() 2025-12-04T09:28:37.0006131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0006210Z fn() 2025-12-04T09:28:37.0006655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0006755Z method(*args, **kwargs) 2025-12-04T09:28:37.0007200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0007291Z method(*args, **kwargs) 2025-12-04T09:28:37.0007740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0007826Z with policy(): 2025-12-04T09:28:37.0008289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0008382Z raise RuntimeError(msg) 2025-12-04T09:28:37.0009703Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:37.0009709Z 2025-12-04T09:28:37.0009910Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0010846Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0010851Z 2025-12-04T09:28:37.0011098Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0011107Z 2025-12-04T09:28:37.0011251Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.0011368Z Traceback (most recent call last): 2025-12-04T09:28:37.0011966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0012064Z getattr(self, test_name)() 2025-12-04T09:28:37.0012547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0012625Z fn() 2025-12-04T09:28:37.0013069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0013171Z method(*args, **kwargs) 2025-12-04T09:28:37.0013853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0013973Z method(*args, **kwargs) 2025-12-04T09:28:37.0014477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0014596Z with policy(): 2025-12-04T09:28:37.0015113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0015220Z raise RuntimeError(msg) 2025-12-04T09:28:37.0016717Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:28:37.0016735Z 2025-12-04T09:28:37.0016947Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0017994Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0018063Z 2025-12-04T09:28:37.0018346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0018351Z 2025-12-04T09:28:37.0018355Z 2025-12-04T09:28:37.0018571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0018849Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0019784Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9650dbe5a6e76fd8.xml - 2025-12-04T09:28:37.0019952Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0021162Z FAILED [9.4491s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.0021289Z Traceback (most recent call last): 2025-12-04T09:28:37.0021853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0021964Z getattr(self, test_name)() 2025-12-04T09:28:37.0022498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0022593Z fn() 2025-12-04T09:28:37.0023097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0023213Z method(*args, **kwargs) 2025-12-04T09:28:37.0023712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0023817Z method(*args, **kwargs) 2025-12-04T09:28:37.0024327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0024478Z with policy(): 2025-12-04T09:28:37.0024985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0025103Z raise RuntimeError(msg) 2025-12-04T09:28:37.0026749Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:37.0026755Z 2025-12-04T09:28:37.0026971Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0027957Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0027966Z 2025-12-04T09:28:37.0028230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0028234Z 2025-12-04T09:28:37.0028385Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.0028496Z Traceback (most recent call last): 2025-12-04T09:28:37.0029015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0029117Z getattr(self, test_name)() 2025-12-04T09:28:37.0029627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0029709Z fn() 2025-12-04T09:28:37.0030178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0030334Z method(*args, **kwargs) 2025-12-04T09:28:37.0030987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0031088Z method(*args, **kwargs) 2025-12-04T09:28:37.0031582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0031673Z with policy(): 2025-12-04T09:28:37.0032169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0032274Z raise RuntimeError(msg) 2025-12-04T09:28:37.0033709Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:28:37.0033730Z 2025-12-04T09:28:37.0033939Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0034964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0034970Z 2025-12-04T09:28:37.0035228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0035400Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0035577Z ======================= 1 failed, 3 deselected in 9.66s ======================== 2025-12-04T09:28:37.0035670Z Got exit code 1 2025-12-04T09:28:37.0035772Z Retrying single test... 2025-12-04T09:28:37.0036515Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-53bea78db525054e.xml 2025-12-04T09:28:37.0036673Z ============================= test session starts ============================== 2025-12-04T09:28:37.0037061Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0037169Z cachedir: .pytest_cache 2025-12-04T09:28:37.0037670Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0037792Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0037890Z configfile: pytest.ini 2025-12-04T09:28:37.0038408Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0039630Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0039762Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0040951Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0041099Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0041242Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.0042338Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0042446Z Running 1 items in this shard 2025-12-04T09:28:37.0042451Z 2025-12-04T09:28:37.0043867Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:20:43.230000 37636 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 37688 2025-12-04T09:28:37.0044345Z I1204 09:20:43.231000 37636 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 37689 2025-12-04T09:28:37.0044820Z I1204 09:20:43.232000 37636 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 37690 2025-12-04T09:28:37.0045303Z I1204 09:20:43.232000 37636 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 37691 2025-12-04T09:28:37.0047692Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0047811Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0050048Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0050160Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0052433Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0052550Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0055080Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0055209Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0059711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0060178Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0064629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0065046Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0069269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0069620Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0073569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0073925Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0074303Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0074749Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0075623Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0076110Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0076973Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0077298Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0078120Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0078542Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0079736Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0080205Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0081134Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0081564Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0082497Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0082966Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0084977Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 724500480. 2025-12-04T09:28:37.0085315Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0085957Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0087447Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0087801Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0088485Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0089015Z E1204 09:20:50.451000 37688 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0089438Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0089936Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0090987Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0091550Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0092403Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0092725Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0093762Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0094229Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0095162Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0095632Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0096555Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0096974Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0097910Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0098429Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0100348Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.0100680Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0101317Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0102812Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0103152Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0103837Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0104348Z E1204 09:20:50.452000 37690 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0104773Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0105327Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0106366Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0106788Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0107641Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0107964Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0108787Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0109203Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0110024Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0110437Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0111255Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0111631Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0112509Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0112916Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0114618Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.0114916Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0115483Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0116788Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0117086Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0117689Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0118142Z E1204 09:20:50.454000 37689 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0118871Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0119314Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0120181Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0120604Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0121454Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0121784Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0122612Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0123021Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0123840Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0124246Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0125066Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0125488Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0126313Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0126721Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0128417Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0128719Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0129278Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0130584Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0130881Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0131660Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0132200Z E1204 09:20:50.461000 37691 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0132301Z FAILED [9.1647s] [100%] 2025-12-04T09:28:37.0132307Z 2025-12-04T09:28:37.0132442Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0133022Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.0133138Z Traceback (most recent call last): 2025-12-04T09:28:37.0133873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0133996Z self._join_processes(fn) 2025-12-04T09:28:37.0134577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0134730Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0135333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0135445Z raise RuntimeError(error) 2025-12-04T09:28:37.0135685Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.0135802Z Traceback (most recent call last): 2025-12-04T09:28:37.0136332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0136444Z getattr(self, test_name)() 2025-12-04T09:28:37.0136979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0137068Z fn() 2025-12-04T09:28:37.0137570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0137679Z method(*args, **kwargs) 2025-12-04T09:28:37.0138246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0138352Z method(*args, **kwargs) 2025-12-04T09:28:37.0138848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0138947Z with policy(): 2025-12-04T09:28:37.0139452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0139564Z raise RuntimeError(msg) 2025-12-04T09:28:37.0141049Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.0141060Z 2025-12-04T09:28:37.0141274Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0142341Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0142347Z 2025-12-04T09:28:37.0142608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0142613Z 2025-12-04T09:28:37.0142618Z 2025-12-04T09:28:37.0142838Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0143096Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0144032Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-53bea78db525054e.xml - 2025-12-04T09:28:37.0144255Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0145460Z FAILED [9.1647s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.0145689Z Traceback (most recent call last): 2025-12-04T09:28:37.0146567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0146672Z getattr(self, test_name)() 2025-12-04T09:28:37.0147142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0147217Z fn() 2025-12-04T09:28:37.0147672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0147762Z method(*args, **kwargs) 2025-12-04T09:28:37.0148208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0148304Z method(*args, **kwargs) 2025-12-04T09:28:37.0148745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0148834Z with policy(): 2025-12-04T09:28:37.0149279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0149369Z raise RuntimeError(msg) 2025-12-04T09:28:37.0150687Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.0150697Z 2025-12-04T09:28:37.0150937Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0151876Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0151881Z 2025-12-04T09:28:37.0152113Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0152274Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0152424Z ======================= 1 failed, 14 deselected in 9.38s ======================= 2025-12-04T09:28:37.0152511Z Got exit code 1 2025-12-04T09:28:37.0152607Z Retrying single test... 2025-12-04T09:28:37.0153278Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-470dd7f8801a129e.xml 2025-12-04T09:28:37.0153420Z ============================= test session starts ============================== 2025-12-04T09:28:37.0153731Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0153824Z cachedir: .pytest_cache 2025-12-04T09:28:37.0154283Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0154388Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0154477Z configfile: pytest.ini 2025-12-04T09:28:37.0154950Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0156065Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0156239Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0157327Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0157461Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0157593Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.0158591Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0158692Z Running 1 items in this shard 2025-12-04T09:28:37.0158701Z 2025-12-04T09:28:37.0159933Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:20:57.030000 37969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38021 2025-12-04T09:28:37.0160376Z I1204 09:20:57.030000 37969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38022 2025-12-04T09:28:37.0160816Z I1204 09:20:57.031000 37969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38023 2025-12-04T09:28:37.0161242Z I1204 09:20:57.032000 37969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38024 2025-12-04T09:28:37.0163424Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0163525Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0165636Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0165735Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0167845Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0167941Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0170049Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:240: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0170205Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0174466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0174864Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0179504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0180005Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0184465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0184861Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0189311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:37.0189781Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:37.0190207Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0190815Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0191675Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0192109Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0192956Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0193279Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0194104Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0194506Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0195389Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0195792Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0196620Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0196987Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0197809Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0198231Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0199918Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 720306176. 2025-12-04T09:28:37.0200222Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0200774Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0202149Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0202445Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0203054Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0203513Z E1204 09:21:04.255000 38021 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0203883Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0204341Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0205202Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0205638Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0206491Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0206814Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0207645Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0208099Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0208926Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0209327Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0210151Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0210521Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0211347Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0211758Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0213511Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.0214008Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0214700Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0216188Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0216518Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0217203Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0217721Z E1204 09:21:04.255000 38023 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0218145Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0218655Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0219619Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0220101Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0221057Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0221422Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0222428Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0222884Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0223824Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0224275Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0225198Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0225620Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0226586Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0227000Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0228686Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0229041Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0229599Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0230917Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0231210Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0231814Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0232278Z E1204 09:21:04.256000 38022 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0232656Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0233110Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0233970Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0234388Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0235237Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0235562Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0236502Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0236911Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0237739Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0238143Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0238967Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0239344Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0240173Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0240595Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0242284Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0242649Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0243211Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0244523Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0244817Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0245430Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0245894Z E1204 09:21:04.258000 38024 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0245983Z FAILED [9.1224s] [100%] 2025-12-04T09:28:37.0245989Z 2025-12-04T09:28:37.0246131Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0246671Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.0246779Z Traceback (most recent call last): 2025-12-04T09:28:37.0247270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0247370Z self._join_processes(fn) 2025-12-04T09:28:37.0247895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0248021Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0248606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0248721Z raise RuntimeError(error) 2025-12-04T09:28:37.0248930Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.0249036Z Traceback (most recent call last): 2025-12-04T09:28:37.0249524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0249621Z getattr(self, test_name)() 2025-12-04T09:28:37.0250098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0250183Z fn() 2025-12-04T09:28:37.0250629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0250733Z method(*args, **kwargs) 2025-12-04T09:28:37.0251180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0251271Z method(*args, **kwargs) 2025-12-04T09:28:37.0251722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0251806Z with policy(): 2025-12-04T09:28:37.0252263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0252362Z raise RuntimeError(msg) 2025-12-04T09:28:37.0253927Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.0254029Z 2025-12-04T09:28:37.0254260Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0255312Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0255318Z 2025-12-04T09:28:37.0255588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0255594Z 2025-12-04T09:28:37.0255598Z 2025-12-04T09:28:37.0255815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0256081Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0257008Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-470dd7f8801a129e.xml - 2025-12-04T09:28:37.0257182Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0258390Z FAILED [9.1224s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.0258510Z Traceback (most recent call last): 2025-12-04T09:28:37.0259069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0259180Z getattr(self, test_name)() 2025-12-04T09:28:37.0259716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0259822Z fn() 2025-12-04T09:28:37.0260328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0260449Z method(*args, **kwargs) 2025-12-04T09:28:37.0261009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0261110Z method(*args, **kwargs) 2025-12-04T09:28:37.0261622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0261715Z with policy(): 2025-12-04T09:28:37.0262221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0262342Z raise RuntimeError(msg) 2025-12-04T09:28:37.0263822Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.0263833Z 2025-12-04T09:28:37.0264060Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0265115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0265121Z 2025-12-04T09:28:37.0265396Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0265680Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0265848Z ======================= 1 failed, 14 deselected in 9.34s ======================= 2025-12-04T09:28:37.0266059Z Got exit code 1 2025-12-04T09:28:37.0267029Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0300190Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.0301031Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-864712a0594b6ca2.xml 2025-12-04T09:28:37.0301195Z ============================= test session starts ============================== 2025-12-04T09:28:37.0301555Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0301660Z cachedir: .pytest_cache 2025-12-04T09:28:37.0302171Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0302306Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0302408Z configfile: pytest.ini 2025-12-04T09:28:37.0302941Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0304212Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0304343Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0305572Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0305723Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0305980Z collected 15 items / 4 deselected / 11 selected 2025-12-04T09:28:37.0306117Z stepcurrent: skipping 4 already run items. 2025-12-04T09:28:37.0306220Z Running 11 items in this shard 2025-12-04T09:28:37.0306228Z 2025-12-04T09:28:37.0307930Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:21:10.819000 38302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38354 2025-12-04T09:28:37.0308410Z I1204 09:21:10.820000 38302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38355 2025-12-04T09:28:37.0308888Z I1204 09:21:10.821000 38302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38356 2025-12-04T09:28:37.0309354Z I1204 09:21:10.822000 38302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38357 2025-12-04T09:28:37.0311686Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0311804Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0314192Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0314380Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0316818Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0316930Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0319231Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0319350Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0321014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0321143Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0322793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0322921Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0324620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0324738Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0326391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0326509Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0326935Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0327428Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0328370Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0328833Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0329759Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0330176Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0335775Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0336234Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0337162Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0337616Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0338552Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0338993Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0339919Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0340378Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0342396Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.0342742Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0343371Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0344861Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0345194Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0345998Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0346600Z E1204 09:21:18.143000 38354 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0346995Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0347472Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0348462Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0348893Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0349774Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0350146Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0350973Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0351374Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0352199Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0352607Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0353447Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0353814Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0354634Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0355048Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0356804Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.0357109Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0357666Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0358988Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0359282Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0359893Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0360358Z E1204 09:21:18.143000 38355 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0360727Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0361174Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0362024Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0362494Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0363336Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0363701Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0364535Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0364936Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0365774Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0366176Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0367008Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0367374Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0368199Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0368617Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0370369Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0370678Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0371235Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0372561Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0372856Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0373552Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0374239Z E1204 09:21:18.144000 38356 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0374658Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0375155Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0376179Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0376688Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0377640Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0378010Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0379142Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0379613Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0380545Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0381009Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0381939Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0382352Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0383295Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0383853Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0385778Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 523173888 and is now 611254272. 2025-12-04T09:28:37.0386108Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0386744Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0388229Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0388571Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0389259Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0389773Z E1204 09:21:18.145000 38357 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0389884Z FAILED [9.0996s] [ 9%] 2025-12-04T09:28:37.0389930Z 2025-12-04T09:28:37.0390077Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0390798Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.0390957Z Traceback (most recent call last): 2025-12-04T09:28:37.0391434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0391540Z self._join_processes(fn) 2025-12-04T09:28:37.0392057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0392180Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0392719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0392820Z raise RuntimeError(error) 2025-12-04T09:28:37.0393032Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.0393137Z Traceback (most recent call last): 2025-12-04T09:28:37.0393612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0393719Z getattr(self, test_name)() 2025-12-04T09:28:37.0394186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0394265Z fn() 2025-12-04T09:28:37.0394720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0394811Z method(*args, **kwargs) 2025-12-04T09:28:37.0395261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0395350Z method(*args, **kwargs) 2025-12-04T09:28:37.0395790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0395880Z with policy(): 2025-12-04T09:28:37.0396377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0396487Z raise RuntimeError(msg) 2025-12-04T09:28:37.0397805Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.0397811Z 2025-12-04T09:28:37.0397998Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0398955Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0398963Z 2025-12-04T09:28:37.0399199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0399204Z 2025-12-04T09:28:37.0399210Z 2025-12-04T09:28:37.0399411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0399644Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0400469Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-864712a0594b6ca2.xml - 2025-12-04T09:28:37.0400617Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0401691Z FAILED [9.0996s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.0401827Z Traceback (most recent call last): 2025-12-04T09:28:37.0402319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0402453Z getattr(self, test_name)() 2025-12-04T09:28:37.0402924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0403000Z fn() 2025-12-04T09:28:37.0403448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0403538Z method(*args, **kwargs) 2025-12-04T09:28:37.0403979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0404078Z method(*args, **kwargs) 2025-12-04T09:28:37.0404519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0404610Z with policy(): 2025-12-04T09:28:37.0405062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0405157Z raise RuntimeError(msg) 2025-12-04T09:28:37.0406492Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.0406498Z 2025-12-04T09:28:37.0406688Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0407635Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0407642Z 2025-12-04T09:28:37.0407939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0408105Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0408257Z ======================= 1 failed, 4 deselected in 9.31s ======================== 2025-12-04T09:28:37.0408344Z Got exit code 1 2025-12-04T09:28:37.0408437Z Retrying single test... 2025-12-04T09:28:37.0409105Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c88c69879eff0a17.xml 2025-12-04T09:28:37.0409243Z ============================= test session starts ============================== 2025-12-04T09:28:37.0409558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0409652Z cachedir: .pytest_cache 2025-12-04T09:28:37.0410113Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0410221Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0410313Z configfile: pytest.ini 2025-12-04T09:28:37.0410795Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0411899Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0412022Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0413099Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0413332Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0413475Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.0414794Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0414913Z Running 1 items in this shard 2025-12-04T09:28:37.0414919Z 2025-12-04T09:28:37.0416330Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:21:24.690000 38635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38687 2025-12-04T09:28:37.0416832Z I1204 09:21:24.691000 38635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38688 2025-12-04T09:28:37.0417331Z I1204 09:21:24.692000 38635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38689 2025-12-04T09:28:37.0417821Z I1204 09:21:24.692000 38635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38690 2025-12-04T09:28:37.0420221Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0420333Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0422764Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0422876Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0425251Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0425358Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0427723Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0427822Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0429353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0429495Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0431017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0431164Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0432675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0432798Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0434312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0434428Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0434803Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0435425Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0436354Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0436854Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0437767Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0438113Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0438987Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0439420Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0440291Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0440728Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0441783Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0442189Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0443088Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0443572Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0445468Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.0445793Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0446409Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0447859Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0448193Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0448962Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0449452Z E1204 09:21:31.933000 38690 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0449845Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0450316Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0451282Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0451732Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0452631Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0452970Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0454088Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0454553Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0455474Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0455935Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0456866Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0457285Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0458251Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0458747Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0460674Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.0461004Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0461636Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0463129Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0463470Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0464149Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0464667Z E1204 09:21:31.937000 38688 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0465089Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0465743Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0466617Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0467042Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0467895Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0468219Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0469046Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0469457Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0470272Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0470684Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0471500Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0471904Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0472723Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0473155Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0474859Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 523173888 and is now 611254272. 2025-12-04T09:28:37.0475150Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0475713Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0477026Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0477324Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0477932Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0478392Z E1204 09:21:31.939000 38689 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0478980Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0479680Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0480657Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0481134Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0482100Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0482471Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0483395Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0483860Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0484785Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0485252Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0486231Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0486699Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0487634Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0488092Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0490033Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.0490367Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0490998Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0492465Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0492766Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0493457Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0494266Z E1204 09:21:31.941000 38687 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0494379Z FAILED [9.1327s] [100%] 2025-12-04T09:28:37.0494385Z 2025-12-04T09:28:37.0494574Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0495197Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.0495316Z Traceback (most recent call last): 2025-12-04T09:28:37.0495855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0495978Z self._join_processes(fn) 2025-12-04T09:28:37.0496557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0496714Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0497318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0497433Z raise RuntimeError(error) 2025-12-04T09:28:37.0497670Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.0497785Z Traceback (most recent call last): 2025-12-04T09:28:37.0498320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0498433Z getattr(self, test_name)() 2025-12-04T09:28:37.0498956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0499083Z fn() 2025-12-04T09:28:37.0499580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0499688Z method(*args, **kwargs) 2025-12-04T09:28:37.0500194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0500329Z method(*args, **kwargs) 2025-12-04T09:28:37.0500827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0500927Z with policy(): 2025-12-04T09:28:37.0501429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0501544Z raise RuntimeError(msg) 2025-12-04T09:28:37.0503032Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.0503045Z 2025-12-04T09:28:37.0503260Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0504330Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0504336Z 2025-12-04T09:28:37.0504602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0504607Z 2025-12-04T09:28:37.0504617Z 2025-12-04T09:28:37.0504832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0505091Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0506215Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c88c69879eff0a17.xml - 2025-12-04T09:28:37.0506421Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0507489Z FAILED [9.1327s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.0507600Z Traceback (most recent call last): 2025-12-04T09:28:37.0508084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0508188Z getattr(self, test_name)() 2025-12-04T09:28:37.0508660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0508740Z fn() 2025-12-04T09:28:37.0509192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0509286Z method(*args, **kwargs) 2025-12-04T09:28:37.0509734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0509826Z method(*args, **kwargs) 2025-12-04T09:28:37.0510268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0510356Z with policy(): 2025-12-04T09:28:37.0510798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0510891Z raise RuntimeError(msg) 2025-12-04T09:28:37.0512231Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.0512300Z 2025-12-04T09:28:37.0512486Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0513437Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0513442Z 2025-12-04T09:28:37.0513675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0513839Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0513989Z ======================= 1 failed, 14 deselected in 9.35s ======================= 2025-12-04T09:28:37.0514072Z Got exit code 1 2025-12-04T09:28:37.0514172Z Retrying single test... 2025-12-04T09:28:37.0514840Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ecd21e7500304b9f.xml 2025-12-04T09:28:37.0514991Z ============================= test session starts ============================== 2025-12-04T09:28:37.0515298Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0515388Z cachedir: .pytest_cache 2025-12-04T09:28:37.0515853Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0515958Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0516048Z configfile: pytest.ini 2025-12-04T09:28:37.0516526Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0517684Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0517812Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0518894Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0519030Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0519165Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.0520168Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0520279Z Running 1 items in this shard 2025-12-04T09:28:37.0520284Z 2025-12-04T09:28:37.0521765Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:21:38.560000 38968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39020 2025-12-04T09:28:37.0522239Z I1204 09:21:38.561000 38968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39021 2025-12-04T09:28:37.0522699Z I1204 09:21:38.561000 38968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39022 2025-12-04T09:28:37.0523153Z I1204 09:21:38.562000 38968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39023 2025-12-04T09:28:37.0525412Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0525570Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0527818Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0527922Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0530170Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0530273Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0532512Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0532615Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0534578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0534708Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0536411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0536544Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0538249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0538380Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0540080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0540238Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0540669Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0541200Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0542175Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0542651Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0543621Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0543988Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0544928Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0545388Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0546460Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0546870Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0547693Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0548117Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0548946Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0549361Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0551285Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.0551613Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0552199Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0553596Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0553919Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0554584Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0555079Z E1204 09:21:45.799000 39021 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0555502Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0555970Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0556876Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0557321Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0558233Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0558576Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0559454Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0559878Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0560747Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0561181Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0562096Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0562503Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0563369Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0563803Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0565612Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.0565925Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0566521Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0567913Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0568275Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0568921Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0569445Z E1204 09:21:45.800000 39020 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0569837Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0570307Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0571222Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0571669Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0572571Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0572917Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0574045Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0574549Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0575475Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0575997Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0576915Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0577333Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0578261Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0578928Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0580848Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.0581182Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0581810Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0583288Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0583691Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0584417Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0584943Z E1204 09:21:45.803000 39023 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0585361Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0585857Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0586836Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0587314Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0588276Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0588636Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0589565Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0590022Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0591084Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0591495Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0592313Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0592684Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0593501Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0593909Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0595618Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0595911Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0596473Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0597819Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0598148Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0598750Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0599218Z E1204 09:21:45.807000 39022 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0599303Z FAILED [9.1546s] [100%] 2025-12-04T09:28:37.0599310Z 2025-12-04T09:28:37.0599442Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0599994Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.0600102Z Traceback (most recent call last): 2025-12-04T09:28:37.0600591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0600688Z self._join_processes(fn) 2025-12-04T09:28:37.0601202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0601335Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0601870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0601969Z raise RuntimeError(error) 2025-12-04T09:28:37.0602188Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.0602291Z Traceback (most recent call last): 2025-12-04T09:28:37.0602815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0602915Z getattr(self, test_name)() 2025-12-04T09:28:37.0603388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0603470Z fn() 2025-12-04T09:28:37.0603914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0604004Z method(*args, **kwargs) 2025-12-04T09:28:37.0604459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0604547Z method(*args, **kwargs) 2025-12-04T09:28:37.0604997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0605082Z with policy(): 2025-12-04T09:28:37.0605528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0605631Z raise RuntimeError(msg) 2025-12-04T09:28:37.0606952Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.0606958Z 2025-12-04T09:28:37.0607159Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0608102Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0608135Z 2025-12-04T09:28:37.0608368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0608382Z 2025-12-04T09:28:37.0608391Z 2025-12-04T09:28:37.0608607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0608834Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0609671Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ecd21e7500304b9f.xml - 2025-12-04T09:28:37.0609816Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0610898Z FAILED [9.1546s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.0611003Z Traceback (most recent call last): 2025-12-04T09:28:37.0611488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0611595Z getattr(self, test_name)() 2025-12-04T09:28:37.0612066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0612150Z fn() 2025-12-04T09:28:37.0612592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0612681Z method(*args, **kwargs) 2025-12-04T09:28:37.0613129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0613283Z method(*args, **kwargs) 2025-12-04T09:28:37.0613914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0614013Z with policy(): 2025-12-04T09:28:37.0614594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0614708Z raise RuntimeError(msg) 2025-12-04T09:28:37.0616199Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.0616206Z 2025-12-04T09:28:37.0616414Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0617483Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0617491Z 2025-12-04T09:28:37.0617755Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0617936Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0618107Z ======================= 1 failed, 14 deselected in 9.37s ======================= 2025-12-04T09:28:37.0618200Z Got exit code 1 2025-12-04T09:28:37.0619187Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0619592Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.0620358Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-e7cfa143d1c9be09.xml 2025-12-04T09:28:37.0620543Z ============================= test session starts ============================== 2025-12-04T09:28:37.0620889Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0621028Z cachedir: .pytest_cache 2025-12-04T09:28:37.0621536Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0621659Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0621762Z configfile: pytest.ini 2025-12-04T09:28:37.0622289Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0623538Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0623673Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0624903Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0625058Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0625202Z collected 15 items / 5 deselected / 10 selected 2025-12-04T09:28:37.0625341Z stepcurrent: skipping 5 already run items. 2025-12-04T09:28:37.0625448Z Running 10 items in this shard 2025-12-04T09:28:37.0625454Z 2025-12-04T09:28:37.0626817Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:21:52.339000 39301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39353 2025-12-04T09:28:37.0627258Z I1204 09:21:52.340000 39301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39354 2025-12-04T09:28:37.0627740Z I1204 09:21:52.341000 39301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39355 2025-12-04T09:28:37.0628179Z I1204 09:21:52.342000 39301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39356 2025-12-04T09:28:37.0630303Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0630408Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0632514Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0632620Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0634718Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0634847Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0637147Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0637280Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0638896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0639026Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0640632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0640747Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0642359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0642529Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0644140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0644254Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0644825Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0645313Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0646260Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0646731Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0647656Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0648017Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0648907Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0649387Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0650291Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0650764Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0651668Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0652067Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0652974Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0653488Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0655583Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 724500480. 2025-12-04T09:28:37.0655919Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0656544Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0658101Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0658437Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0659130Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0659644Z E1204 09:21:59.663000 39353 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0660077Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0660580Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0661546Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0662032Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0662991Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0663368Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0664322Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0664779Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0665926Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0666350Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0667235Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0667623Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0668513Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0668945Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0670746Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.0671057Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0671707Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0673106Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0673414Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0674059Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0674631Z E1204 09:21:59.664000 39354 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0675006Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0675449Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0676303Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0676728Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0677573Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0677933Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0678893Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0679540Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0680475Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0680927Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0681858Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0682277Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0683211Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0683666Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0685584Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 498008064 and is now 613351424. 2025-12-04T09:28:37.0686009Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0686638Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0688124Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0688453Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0689141Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0689658Z E1204 09:21:59.684000 39356 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0690081Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0690588Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0691646Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0692099Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0693030Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0693445Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0694578Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0695036Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0695967Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0696425Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0697364Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0697780Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0698721Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0699176Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0701152Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.0701500Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0702124Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0703612Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0703945Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0704637Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0705156Z E1204 09:21:59.690000 39355 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0705250Z FAILED [9.1791s] [ 10%] 2025-12-04T09:28:37.0705256Z 2025-12-04T09:28:37.0705406Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0706206Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.0706324Z Traceback (most recent call last): 2025-12-04T09:28:37.0706801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0706929Z self._join_processes(fn) 2025-12-04T09:28:37.0707450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0707575Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0708142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0708243Z raise RuntimeError(error) 2025-12-04T09:28:37.0708445Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.0708560Z Traceback (most recent call last): 2025-12-04T09:28:37.0709032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0709126Z getattr(self, test_name)() 2025-12-04T09:28:37.0709600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0709678Z fn() 2025-12-04T09:28:37.0710131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0710221Z method(*args, **kwargs) 2025-12-04T09:28:37.0710664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0710765Z method(*args, **kwargs) 2025-12-04T09:28:37.0711206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0711288Z with policy(): 2025-12-04T09:28:37.0711740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0711835Z raise RuntimeError(msg) 2025-12-04T09:28:37.0713264Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 498008064 and is now 613351424. 2025-12-04T09:28:37.0713274Z 2025-12-04T09:28:37.0713461Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0714598Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0714603Z 2025-12-04T09:28:37.0714853Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0714858Z 2025-12-04T09:28:37.0714862Z 2025-12-04T09:28:37.0715062Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0715318Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0716195Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-e7cfa143d1c9be09.xml - 2025-12-04T09:28:37.0716363Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0717495Z FAILED [9.1791s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.0717603Z Traceback (most recent call last): 2025-12-04T09:28:37.0718118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0718219Z getattr(self, test_name)() 2025-12-04T09:28:37.0718758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0718839Z fn() 2025-12-04T09:28:37.0719315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0719459Z method(*args, **kwargs) 2025-12-04T09:28:37.0719928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0720028Z method(*args, **kwargs) 2025-12-04T09:28:37.0720494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0720582Z with policy(): 2025-12-04T09:28:37.0721060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0721162Z raise RuntimeError(msg) 2025-12-04T09:28:37.0722558Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 498008064 and is now 613351424. 2025-12-04T09:28:37.0722572Z 2025-12-04T09:28:37.0722771Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0723759Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0723764Z 2025-12-04T09:28:37.0724011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0724173Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0724339Z ======================= 1 failed, 5 deselected in 9.39s ======================== 2025-12-04T09:28:37.0724427Z Got exit code 1 2025-12-04T09:28:37.0724521Z Retrying single test... 2025-12-04T09:28:37.0725284Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-976f30802ad214bb.xml 2025-12-04T09:28:37.0725435Z ============================= test session starts ============================== 2025-12-04T09:28:37.0725755Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0725856Z cachedir: .pytest_cache 2025-12-04T09:28:37.0726508Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0726625Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0726724Z configfile: pytest.ini 2025-12-04T09:28:37.0727239Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0728470Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0728594Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0729772Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0729919Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0730059Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.0731160Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0731293Z Running 1 items in this shard 2025-12-04T09:28:37.0731299Z 2025-12-04T09:28:37.0732664Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:22:06.250000 39634 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39686 2025-12-04T09:28:37.0733168Z I1204 09:22:06.251000 39634 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39687 2025-12-04T09:28:37.0733890Z I1204 09:22:06.252000 39634 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39688 2025-12-04T09:28:37.0734378Z I1204 09:22:06.252000 39634 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39689 2025-12-04T09:28:37.0736781Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0736902Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0739275Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0739404Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0741832Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0741947Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0744312Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0744431Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0746191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0746310Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0747993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0748154Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0749758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0749907Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0751502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0751617Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0752031Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0752507Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0753429Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0753875Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0754776Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0755178Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0756052Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0756489Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0757545Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0757992Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0758890Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0759292Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0760211Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0760650Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0762511Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 720306176. 2025-12-04T09:28:37.0762888Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0763504Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0764938Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0765268Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0765932Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0766433Z E1204 09:22:13.496000 39686 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0766847Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0767331Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0768275Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0768841Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0769795Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0770151Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0771202Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0771653Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0772555Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0773007Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0774149Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0774563Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0775500Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0775957Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0777928Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 529465344 and is now 611254272. 2025-12-04T09:28:37.0778292Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0779104Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0780586Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0780930Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0781620Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0782134Z E1204 09:22:13.496000 39687 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0782559Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0783055Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0784035Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0784606Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0785560Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0785934Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0786857Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0787315Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0788254Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0788713Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0789640Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0790049Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0791075Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0791537Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0793349Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0793705Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0794298Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0795699Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0796012Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0796663Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0797142Z E1204 09:22:13.496000 39689 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0797542Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0798007Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0798975Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0799526Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0800369Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0800702Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0801519Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0801933Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0802755Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0803158Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0803981Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0804346Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0805205Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0805613Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0807349Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0807641Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0808204Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0809524Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0809820Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0810436Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0810891Z E1204 09:22:13.496000 39688 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0810989Z FAILED [9.0886s] [100%] 2025-12-04T09:28:37.0810997Z 2025-12-04T09:28:37.0811123Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0811713Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.0811832Z Traceback (most recent call last): 2025-12-04T09:28:37.0812309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0812415Z self._join_processes(fn) 2025-12-04T09:28:37.0812927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0813051Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0813826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0813944Z raise RuntimeError(error) 2025-12-04T09:28:37.0814188Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.0814303Z Traceback (most recent call last): 2025-12-04T09:28:37.0814843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0814964Z getattr(self, test_name)() 2025-12-04T09:28:37.0815489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0815576Z fn() 2025-12-04T09:28:37.0816086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0816188Z method(*args, **kwargs) 2025-12-04T09:28:37.0816699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0816804Z method(*args, **kwargs) 2025-12-04T09:28:37.0817336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0817439Z with policy(): 2025-12-04T09:28:37.0817950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0818085Z raise RuntimeError(msg) 2025-12-04T09:28:37.0819578Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 529465344 and is now 611254272. 2025-12-04T09:28:37.0819585Z 2025-12-04T09:28:37.0819796Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0820864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0820872Z 2025-12-04T09:28:37.0821139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0821144Z 2025-12-04T09:28:37.0821151Z 2025-12-04T09:28:37.0821372Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0821629Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0822557Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-976f30802ad214bb.xml - 2025-12-04T09:28:37.0822734Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0823942Z FAILED [9.0886s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.0824068Z Traceback (most recent call last): 2025-12-04T09:28:37.0824679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0824791Z getattr(self, test_name)() 2025-12-04T09:28:37.0825330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0825415Z fn() 2025-12-04T09:28:37.0826025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0826118Z method(*args, **kwargs) 2025-12-04T09:28:37.0826555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0826653Z method(*args, **kwargs) 2025-12-04T09:28:37.0827092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0827179Z with policy(): 2025-12-04T09:28:37.0827635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0827733Z raise RuntimeError(msg) 2025-12-04T09:28:37.0829068Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 529465344 and is now 611254272. 2025-12-04T09:28:37.0829074Z 2025-12-04T09:28:37.0829263Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0830208Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0830240Z 2025-12-04T09:28:37.0830473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0830657Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0830817Z ======================= 1 failed, 14 deselected in 9.30s ======================= 2025-12-04T09:28:37.0830900Z Got exit code 1 2025-12-04T09:28:37.0830999Z Retrying single test... 2025-12-04T09:28:37.0831670Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d8b05be053af669.xml 2025-12-04T09:28:37.0831807Z ============================= test session starts ============================== 2025-12-04T09:28:37.0832116Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0832211Z cachedir: .pytest_cache 2025-12-04T09:28:37.0832666Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0832786Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0832877Z configfile: pytest.ini 2025-12-04T09:28:37.0833350Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0834465Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0834580Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0835668Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0835804Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0835996Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.0837004Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0837100Z Running 1 items in this shard 2025-12-04T09:28:37.0837105Z 2025-12-04T09:28:37.0838354Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:22:20.059000 39967 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40019 2025-12-04T09:28:37.0838792Z I1204 09:22:20.060000 39967 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40020 2025-12-04T09:28:37.0839237Z I1204 09:22:20.061000 39967 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40021 2025-12-04T09:28:37.0839667Z I1204 09:22:20.062000 39967 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40022 2025-12-04T09:28:37.0841801Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0841927Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0844035Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0844158Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0846272Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0846370Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0847898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0848019Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0850116Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0850221Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0851795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0851914Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0853485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0853768Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0855472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0855608Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0856032Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0856534Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0857543Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0858023Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0859020Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0859385Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0860309Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0860774Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0861706Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0862167Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0863089Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0863515Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0864452Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0865360Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0867342Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.0867638Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0868196Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0869518Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0869820Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0870428Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0870886Z E1204 09:22:27.300000 40019 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.0871266Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0871737Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0872607Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0873054Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0873904Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0874223Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0875045Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0875460Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0876281Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0876693Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0877509Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0877885Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0878908Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0879530Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0881459Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.0881793Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0882437Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0883918Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0884254Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0884936Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0885453Z E1204 09:22:27.300000 40020 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.0885936Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0886440Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0887450Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0887922Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0888889Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0889257Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0890193Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0890661Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0891662Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0892070Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0892886Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0893379Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0894467Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0894926Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0896853Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 529465344 and is now 611254272. 2025-12-04T09:28:37.0897187Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0897822Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0899304Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0899637Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0900319Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0900861Z E1204 09:22:27.302000 40022 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0901289Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0901820Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0902794Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0903266Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0904218Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0904596Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0905637Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0906183Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0907001Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0907408Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0908277Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0908646Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0909472Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0909875Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0911587Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.0911882Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0912444Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0913763Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0914064Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0914696Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0915155Z E1204 09:22:27.308000 40021 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.0915277Z FAILED [9.3061s] [100%] 2025-12-04T09:28:37.0915283Z 2025-12-04T09:28:37.0915411Z =================================== FAILURES =================================== 2025-12-04T09:28:37.0915959Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.0916067Z Traceback (most recent call last): 2025-12-04T09:28:37.0916546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.0916655Z self._join_processes(fn) 2025-12-04T09:28:37.0917166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.0917295Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.0917832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.0917931Z raise RuntimeError(error) 2025-12-04T09:28:37.0918143Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.0918248Z Traceback (most recent call last): 2025-12-04T09:28:37.0918722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0918827Z getattr(self, test_name)() 2025-12-04T09:28:37.0919298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0919378Z fn() 2025-12-04T09:28:37.0919832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0919921Z method(*args, **kwargs) 2025-12-04T09:28:37.0920422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0920516Z method(*args, **kwargs) 2025-12-04T09:28:37.0920955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0921045Z with policy(): 2025-12-04T09:28:37.0921491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0921593Z raise RuntimeError(msg) 2025-12-04T09:28:37.0922915Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.0922923Z 2025-12-04T09:28:37.0923115Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0924275Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0924280Z 2025-12-04T09:28:37.0924527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0924532Z 2025-12-04T09:28:37.0924694Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.0924806Z Traceback (most recent call last): 2025-12-04T09:28:37.0925312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0925456Z getattr(self, test_name)() 2025-12-04T09:28:37.0925957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0926046Z fn() 2025-12-04T09:28:37.0926612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0926708Z method(*args, **kwargs) 2025-12-04T09:28:37.0927189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0927282Z method(*args, **kwargs) 2025-12-04T09:28:37.0927745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0927850Z with policy(): 2025-12-04T09:28:37.0928321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0928430Z raise RuntimeError(msg) 2025-12-04T09:28:37.0929834Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.0929842Z 2025-12-04T09:28:37.0930051Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0931049Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0931054Z 2025-12-04T09:28:37.0931300Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0931307Z 2025-12-04T09:28:37.0931320Z 2025-12-04T09:28:37.0931521Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.0931820Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.0932707Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d8b05be053af669.xml - 2025-12-04T09:28:37.0932865Z =========================== short test summary info ============================ 2025-12-04T09:28:37.0934254Z FAILED [9.3061s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.0934385Z Traceback (most recent call last): 2025-12-04T09:28:37.0934930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0935050Z getattr(self, test_name)() 2025-12-04T09:28:37.0935585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0935673Z fn() 2025-12-04T09:28:37.0936184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0936285Z method(*args, **kwargs) 2025-12-04T09:28:37.0936792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0936893Z method(*args, **kwargs) 2025-12-04T09:28:37.0937397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0937496Z with policy(): 2025-12-04T09:28:37.0938002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0938143Z raise RuntimeError(msg) 2025-12-04T09:28:37.0939642Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.0939676Z 2025-12-04T09:28:37.0939892Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0940964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0940970Z 2025-12-04T09:28:37.0941228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0941235Z 2025-12-04T09:28:37.0941400Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.0941515Z Traceback (most recent call last): 2025-12-04T09:28:37.0942061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0942176Z getattr(self, test_name)() 2025-12-04T09:28:37.0942706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0942797Z fn() 2025-12-04T09:28:37.0943299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0943398Z method(*args, **kwargs) 2025-12-04T09:28:37.0943908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0944009Z method(*args, **kwargs) 2025-12-04T09:28:37.0944502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0944606Z with policy(): 2025-12-04T09:28:37.0945167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0945287Z raise RuntimeError(msg) 2025-12-04T09:28:37.0946874Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.0946880Z 2025-12-04T09:28:37.0947065Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0948007Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0948014Z 2025-12-04T09:28:37.0948246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0948410Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.0948564Z ======================= 1 failed, 14 deselected in 9.52s ======================= 2025-12-04T09:28:37.0948644Z Got exit code 1 2025-12-04T09:28:37.0949515Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.0949875Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.0950550Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ceb5badd22358e55.xml 2025-12-04T09:28:37.0950717Z ============================= test session starts ============================== 2025-12-04T09:28:37.0951022Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.0951149Z cachedir: .pytest_cache 2025-12-04T09:28:37.0951603Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.0951713Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.0951804Z configfile: pytest.ini 2025-12-04T09:28:37.0952271Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.0953387Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0953506Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.0954590Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.0954726Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.0954853Z collected 15 items / 6 deselected / 9 selected 2025-12-04T09:28:37.0954977Z stepcurrent: skipping 6 already run items. 2025-12-04T09:28:37.0955075Z Running 9 items in this shard 2025-12-04T09:28:37.0955080Z 2025-12-04T09:28:37.0956327Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:22:33.860000 40300 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40352 2025-12-04T09:28:37.0956816Z I1204 09:22:33.861000 40300 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40353 2025-12-04T09:28:37.0957249Z I1204 09:22:33.861000 40300 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40354 2025-12-04T09:28:37.0957687Z I1204 09:22:33.862000 40300 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40355 2025-12-04T09:28:37.0959809Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0959917Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0962022Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0962129Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0964237Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0964370Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0966480Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.0966579Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.0968107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0968230Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0969741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0969848Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0971358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0971519Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0973033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.0973142Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.0973752Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0974258Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0975236Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0975720Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0976683Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0977051Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0977979Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0978488Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0979606Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0980125Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0981058Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0981471Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0982415Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0982874Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0984807Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 414121984 and is now 617545728. 2025-12-04T09:28:37.0985142Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0985767Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.0987315Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.0987651Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0988344Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.0988856Z E1204 09:22:41.138000 40355 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.0989282Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.0989780Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.0990838Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.0991265Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.0992110Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.0992440Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.0993297Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0993709Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0994560Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.0994962Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.0995789Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.0996152Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.0996983Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.0997388Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.0999086Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:37.0999382Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.0999986Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1001304Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1001596Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1002202Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1002659Z E1204 09:22:41.141000 40352 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.1003042Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1003484Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1004336Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1004759Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1005605Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1005956Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1006781Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1007218Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1008044Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1008444Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1009269Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1009639Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1010465Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1010871Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1012564Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.1012910Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1013527Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1015172Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1015504Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1016204Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1016720Z E1204 09:22:41.141000 40354 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.1017146Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1017643Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1018612Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1019094Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1020085Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1020460Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1021415Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1021868Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1022801Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1023252Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1024187Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1024600Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1025537Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1026088Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1027850Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.1028157Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1028709Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1030023Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1030316Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1030932Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1031390Z E1204 09:22:41.142000 40353 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.1031474Z FAILED [9.5970s] [ 11%] 2025-12-04T09:28:37.1031480Z 2025-12-04T09:28:37.1031612Z =================================== FAILURES =================================== 2025-12-04T09:28:37.1032150Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.1032260Z Traceback (most recent call last): 2025-12-04T09:28:37.1032735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.1032862Z self._join_processes(fn) 2025-12-04T09:28:37.1033385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.1033536Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.1034078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.1034177Z raise RuntimeError(error) 2025-12-04T09:28:37.1034380Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1034489Z Traceback (most recent call last): 2025-12-04T09:28:37.1034964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1035058Z getattr(self, test_name)() 2025-12-04T09:28:37.1035536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1035612Z fn() 2025-12-04T09:28:37.1036066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1036158Z method(*args, **kwargs) 2025-12-04T09:28:37.1036600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1036694Z method(*args, **kwargs) 2025-12-04T09:28:37.1037135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1037217Z with policy(): 2025-12-04T09:28:37.1037668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1037760Z raise RuntimeError(msg) 2025-12-04T09:28:37.1039144Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:37.1039152Z 2025-12-04T09:28:37.1039338Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1040282Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1040288Z 2025-12-04T09:28:37.1040518Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1040522Z 2025-12-04T09:28:37.1040664Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.1040777Z Traceback (most recent call last): 2025-12-04T09:28:37.1041263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1041366Z getattr(self, test_name)() 2025-12-04T09:28:37.1041839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1041915Z fn() 2025-12-04T09:28:37.1042365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1042459Z method(*args, **kwargs) 2025-12-04T09:28:37.1042895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1042990Z method(*args, **kwargs) 2025-12-04T09:28:37.1043424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1043544Z with policy(): 2025-12-04T09:28:37.1043986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1044079Z raise RuntimeError(msg) 2025-12-04T09:28:37.1045411Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.1045445Z 2025-12-04T09:28:37.1045632Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1046575Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1046582Z 2025-12-04T09:28:37.1046809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1046814Z 2025-12-04T09:28:37.1046961Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.1047067Z Traceback (most recent call last): 2025-12-04T09:28:37.1047547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1047653Z getattr(self, test_name)() 2025-12-04T09:28:37.1048120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1048196Z fn() 2025-12-04T09:28:37.1048642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1048730Z method(*args, **kwargs) 2025-12-04T09:28:37.1049176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1049265Z method(*args, **kwargs) 2025-12-04T09:28:37.1049706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1049846Z with policy(): 2025-12-04T09:28:37.1050298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1050392Z raise RuntimeError(msg) 2025-12-04T09:28:37.1051723Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.1051728Z 2025-12-04T09:28:37.1051914Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1052857Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1052866Z 2025-12-04T09:28:37.1053095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1053102Z 2025-12-04T09:28:37.1053106Z 2025-12-04T09:28:37.1053372Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.1053770Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.1054702Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ceb5badd22358e55.xml - 2025-12-04T09:28:37.1054875Z =========================== short test summary info ============================ 2025-12-04T09:28:37.1056086Z FAILED [9.5970s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1056254Z Traceback (most recent call last): 2025-12-04T09:28:37.1056826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1056934Z getattr(self, test_name)() 2025-12-04T09:28:37.1057470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1057558Z fn() 2025-12-04T09:28:37.1058063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1058164Z method(*args, **kwargs) 2025-12-04T09:28:37.1058661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1058770Z method(*args, **kwargs) 2025-12-04T09:28:37.1059270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1059375Z with policy(): 2025-12-04T09:28:37.1059879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1059980Z raise RuntimeError(msg) 2025-12-04T09:28:37.1061479Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:37.1061485Z 2025-12-04T09:28:37.1061694Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1062817Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1062824Z 2025-12-04T09:28:37.1063090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1063095Z 2025-12-04T09:28:37.1063253Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.1063381Z Traceback (most recent call last): 2025-12-04T09:28:37.1063922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1064036Z getattr(self, test_name)() 2025-12-04T09:28:37.1064564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1064647Z fn() 2025-12-04T09:28:37.1065156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1065258Z method(*args, **kwargs) 2025-12-04T09:28:37.1065761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1065868Z method(*args, **kwargs) 2025-12-04T09:28:37.1066417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1066503Z with policy(): 2025-12-04T09:28:37.1066949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1067045Z raise RuntimeError(msg) 2025-12-04T09:28:37.1068362Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.1068396Z 2025-12-04T09:28:37.1068584Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1069550Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1069556Z 2025-12-04T09:28:37.1069786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1069790Z 2025-12-04T09:28:37.1070106Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.1070214Z Traceback (most recent call last): 2025-12-04T09:28:37.1070724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1070836Z getattr(self, test_name)() 2025-12-04T09:28:37.1071332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1071417Z fn() 2025-12-04T09:28:37.1071895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1071990Z method(*args, **kwargs) 2025-12-04T09:28:37.1072464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1072555Z method(*args, **kwargs) 2025-12-04T09:28:37.1073022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1073117Z with policy(): 2025-12-04T09:28:37.1073592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1073694Z raise RuntimeError(msg) 2025-12-04T09:28:37.1075170Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.1075178Z 2025-12-04T09:28:37.1075375Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1076375Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1076381Z 2025-12-04T09:28:37.1076624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1076797Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.1076960Z ======================= 1 failed, 6 deselected in 9.81s ======================== 2025-12-04T09:28:37.1077052Z Got exit code 1 2025-12-04T09:28:37.1077162Z Retrying single test... 2025-12-04T09:28:37.1077877Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-f5dcf7c66579f3c2.xml 2025-12-04T09:28:37.1078043Z ============================= test session starts ============================== 2025-12-04T09:28:37.1078537Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.1078780Z cachedir: .pytest_cache 2025-12-04T09:28:37.1079463Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.1079587Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.1079691Z configfile: pytest.ini 2025-12-04T09:28:37.1080305Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.1081569Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1081758Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.1082982Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1083138Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.1083363Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.1084537Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1084852Z Running 1 items in this shard 2025-12-04T09:28:37.1084858Z 2025-12-04T09:28:37.1086329Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:22:47.670000 40633 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40685 2025-12-04T09:28:37.1087000Z I1204 09:22:47.671000 40633 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40686 2025-12-04T09:28:37.1087532Z I1204 09:22:47.671000 40633 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40687 2025-12-04T09:28:37.1088057Z I1204 09:22:47.672000 40633 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40688 2025-12-04T09:28:37.1090594Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1090793Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1093429Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1093753Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1096220Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1096374Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1098852Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1099088Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1100896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1101063Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1102875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1103024Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1104804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1105152Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1106840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1107119Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1107579Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1108222Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1109333Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1109821Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1110821Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1111215Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1112155Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1112656Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1113624Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1114135Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1115048Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1115555Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1116503Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1117040Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1118808Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.1119190Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1119784Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1121162Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1121536Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1122255Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1122797Z E1204 09:22:54.981000 40688 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.1123206Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1123793Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1124668Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1125241Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1126134Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1126498Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1127406Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1127848Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1128766Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1129254Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1130188Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1130594Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1131456Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1131937Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1133983Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.1134437Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1135104Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1136680Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1137126Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1137913Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1138486Z E1204 09:22:54.982000 40685 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.1138944Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1139598Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1140603Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1141154Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1142191Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1142618Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1143639Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1144176Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1145196Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1145802Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1146875Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1147290Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1148151Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1148649Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1150383Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:28:37.1150779Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1151392Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1152858Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1153192Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1153882Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1154354Z E1204 09:22:54.984000 40686 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.1154802Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1155416Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1156314Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1156822Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1157708Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1158137Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1159015Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1159488Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1160419Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1160858Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1161750Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1162185Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1163113Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1163556Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1165291Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 523173888 and is now 611254272. 2025-12-04T09:28:37.1165677Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1166268Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1167789Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1168128Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1168830Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1169322Z E1204 09:22:54.991000 40687 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.1169536Z FAILED [9.1708s] [100%] 2025-12-04T09:28:37.1169542Z 2025-12-04T09:28:37.1169753Z =================================== FAILURES =================================== 2025-12-04T09:28:37.1170359Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.1170553Z Traceback (most recent call last): 2025-12-04T09:28:37.1171083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.1171221Z self._join_processes(fn) 2025-12-04T09:28:37.1171798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.1171993Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.1172624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.1172808Z raise RuntimeError(error) 2025-12-04T09:28:37.1173051Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1173328Z Traceback (most recent call last): 2025-12-04T09:28:37.1174028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1174322Z getattr(self, test_name)() 2025-12-04T09:28:37.1174906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1175039Z fn() 2025-12-04T09:28:37.1175631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1175776Z method(*args, **kwargs) 2025-12-04T09:28:37.1176293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1176550Z method(*args, **kwargs) 2025-12-04T09:28:37.1177094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1177338Z with policy(): 2025-12-04T09:28:37.1177884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1178034Z raise RuntimeError(msg) 2025-12-04T09:28:37.1179823Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.1179832Z 2025-12-04T09:28:37.1180120Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1181278Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1181284Z 2025-12-04T09:28:37.1181686Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1181695Z 2025-12-04T09:28:37.1181699Z 2025-12-04T09:28:37.1182007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.1182285Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.1183311Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-f5dcf7c66579f3c2.xml - 2025-12-04T09:28:37.1183590Z =========================== short test summary info ============================ 2025-12-04T09:28:37.1184840Z FAILED [9.1708s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1185058Z Traceback (most recent call last): 2025-12-04T09:28:37.1185650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1185870Z getattr(self, test_name)() 2025-12-04T09:28:37.1186463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1186593Z fn() 2025-12-04T09:28:37.1187188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1187334Z method(*args, **kwargs) 2025-12-04T09:28:37.1187887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1188136Z method(*args, **kwargs) 2025-12-04T09:28:37.1188701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1188942Z with policy(): 2025-12-04T09:28:37.1189529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1189689Z raise RuntimeError(msg) 2025-12-04T09:28:37.1191326Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.1191333Z 2025-12-04T09:28:37.1191610Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1192712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1192718Z 2025-12-04T09:28:37.1193006Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1193266Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.1193467Z ======================= 1 failed, 14 deselected in 9.38s ======================= 2025-12-04T09:28:37.1193575Z Got exit code 1 2025-12-04T09:28:37.1193812Z Retrying single test... 2025-12-04T09:28:37.1194562Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-21e2c8920cf3865d.xml 2025-12-04T09:28:37.1194749Z ============================= test session starts ============================== 2025-12-04T09:28:37.1195173Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.1195313Z cachedir: .pytest_cache 2025-12-04T09:28:37.1195955Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.1196129Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.1196279Z configfile: pytest.ini 2025-12-04T09:28:37.1196867Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.1198086Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1198275Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.1199502Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1199760Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.1199998Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.1201105Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1201296Z Running 1 items in this shard 2025-12-04T09:28:37.1201302Z 2025-12-04T09:28:37.1202747Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:23:01.579000 40966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41018 2025-12-04T09:28:37.1203362Z I1204 09:23:01.580000 40966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41019 2025-12-04T09:28:37.1203839Z I1204 09:23:01.581000 40966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41020 2025-12-04T09:28:37.1204337Z I1204 09:23:01.582000 40966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41021 2025-12-04T09:28:37.1206540Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1206679Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1208915Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1209070Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1210677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1210830Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1213096Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1213298Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1215975Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1216131Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1217932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1218100Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1219953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1220190Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1222269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1222484Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1222964Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1223557Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1224549Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1225173Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1226350Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1226716Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1227630Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1228077Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1229041Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1229504Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1230418Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1230824Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1231688Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1232166Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1233928Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.1234337Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1234994Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1236427Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1236812Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1237516Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1238036Z E1204 09:23:08.863000 41018 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.1238445Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1238972Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1239875Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1240362Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1241289Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1241664Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1242568Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1243060Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1243979Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1244398Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1245354Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1245760Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1246622Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1247113Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1248862Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.1249283Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1249973Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1251394Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1251735Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1252429Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1252902Z E1204 09:23:08.865000 41019 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.1253445Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1254209Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1255225Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1255804Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1256799Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1257269Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1258327Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1258839Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1259856Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1260352Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1261345Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1261842Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1262896Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1263399Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1265347Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 481230848 and is now 611254272. 2025-12-04T09:28:37.1265858Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1266569Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1268043Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1268377Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1269068Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1269571Z E1204 09:23:08.865000 41021 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.1269980Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1270527Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1271628Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1272162Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1273105Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1273597Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1274492Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1275052Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1275961Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1276427Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1277386Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1277827Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1284895Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1285441Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1287378Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.1287873Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1288501Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1289992Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1290327Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1291131Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1291720Z E1204 09:23:08.865000 41020 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.1291810Z FAILED [9.2256s] [100%] 2025-12-04T09:28:37.1291824Z 2025-12-04T09:28:37.1291950Z =================================== FAILURES =================================== 2025-12-04T09:28:37.1292490Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.1292603Z Traceback (most recent call last): 2025-12-04T09:28:37.1293087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.1293269Z self._join_processes(fn) 2025-12-04T09:28:37.1294092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.1294235Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.1294846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.1294956Z raise RuntimeError(error) 2025-12-04T09:28:37.1295185Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1295313Z Traceback (most recent call last): 2025-12-04T09:28:37.1295846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1295952Z getattr(self, test_name)() 2025-12-04T09:28:37.1296483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1296570Z fn() 2025-12-04T09:28:37.1297080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1297183Z method(*args, **kwargs) 2025-12-04T09:28:37.1297683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1297792Z method(*args, **kwargs) 2025-12-04T09:28:37.1298290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1298383Z with policy(): 2025-12-04T09:28:37.1298893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1298997Z raise RuntimeError(msg) 2025-12-04T09:28:37.1300496Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.1300562Z 2025-12-04T09:28:37.1300775Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1301842Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1301848Z 2025-12-04T09:28:37.1302109Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1302114Z 2025-12-04T09:28:37.1302119Z 2025-12-04T09:28:37.1302341Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.1302606Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.1303532Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-21e2c8920cf3865d.xml - 2025-12-04T09:28:37.1303710Z =========================== short test summary info ============================ 2025-12-04T09:28:37.1304921Z FAILED [9.2256s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1305044Z Traceback (most recent call last): 2025-12-04T09:28:37.1305693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1305795Z getattr(self, test_name)() 2025-12-04T09:28:37.1306400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1306474Z fn() 2025-12-04T09:28:37.1307027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1307123Z method(*args, **kwargs) 2025-12-04T09:28:37.1307561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1307658Z method(*args, **kwargs) 2025-12-04T09:28:37.1308100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1308183Z with policy(): 2025-12-04T09:28:37.1308633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1308723Z raise RuntimeError(msg) 2025-12-04T09:28:37.1310055Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.1310064Z 2025-12-04T09:28:37.1310252Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1311183Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1311195Z 2025-12-04T09:28:37.1311429Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1311581Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.1311764Z ======================= 1 failed, 14 deselected in 9.44s ======================= 2025-12-04T09:28:37.1311844Z Got exit code 1 2025-12-04T09:28:37.1312717Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.1313103Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.1313773Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-3dd1dab0649736e8.xml 2025-12-04T09:28:37.1313916Z ============================= test session starts ============================== 2025-12-04T09:28:37.1314220Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.1314313Z cachedir: .pytest_cache 2025-12-04T09:28:37.1314772Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.1314882Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.1314979Z configfile: pytest.ini 2025-12-04T09:28:37.1315454Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.1316567Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1316689Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.1317767Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1317909Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.1318032Z collected 15 items / 7 deselected / 8 selected 2025-12-04T09:28:37.1318152Z stepcurrent: skipping 7 already run items. 2025-12-04T09:28:37.1318322Z Running 8 items in this shard 2025-12-04T09:28:37.1318329Z 2025-12-04T09:28:37.1319573Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:23:15.439000 41299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41351 2025-12-04T09:28:37.1320017Z I1204 09:23:15.440000 41299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41352 2025-12-04T09:28:37.1320449Z I1204 09:23:15.441000 41299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41353 2025-12-04T09:28:37.1320882Z I1204 09:23:15.442000 41299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41354 2025-12-04T09:28:37.1323018Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1323118Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1325221Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1325355Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1326890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1327027Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1329152Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1329248Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1331347Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1331443Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1333017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1333131Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1335024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1335154Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1336857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1336989Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1337418Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1337930Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1338904Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1339379Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1340381Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1340772Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1341709Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1342162Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1343096Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1343553Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1344480Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1344901Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1345927Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1346341Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1348086Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 724500480. 2025-12-04T09:28:37.1348393Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1348945Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1350254Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1350555Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1351166Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1351631Z E1204 09:23:22.772000 41351 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.1352006Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1352453Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1353309Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1353759Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1354617Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1354967Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1355795Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1356196Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1357016Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1357430Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1358253Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1358621Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1359442Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1359862Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1361602Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 481230848 and is now 613351424. 2025-12-04T09:28:37.1361908Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1362462Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1363779Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1364087Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1364694Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1365163Z E1204 09:23:22.774000 41354 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.1365535Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1365974Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1366861Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1367283Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1368172Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1368493Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1369316Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1369720Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1370539Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1370951Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1371768Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1372141Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1372962Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1373512Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1375575Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.1375910Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1376534Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1378012Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1378353Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1379218Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1379740Z E1204 09:23:22.775000 41352 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.1380156Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1380713Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1381687Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1382200Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1383155Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1383517Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1384450Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1384909Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1385841Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1386297Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1387224Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1387644Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1388639Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1389106Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1391103Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.1391413Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1392009Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1393404Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1393722Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1394364Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1394850Z E1204 09:23:22.776000 41353 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.1394972Z FAILED [9.1817s] [ 12%] 2025-12-04T09:28:37.1394978Z 2025-12-04T09:28:37.1395111Z =================================== FAILURES =================================== 2025-12-04T09:28:37.1395693Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.1395831Z Traceback (most recent call last): 2025-12-04T09:28:37.1396344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.1396611Z self._join_processes(fn) 2025-12-04T09:28:37.1397170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.1397312Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.1397893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.1398004Z raise RuntimeError(error) 2025-12-04T09:28:37.1398234Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1398348Z Traceback (most recent call last): 2025-12-04T09:28:37.1398875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1398977Z getattr(self, test_name)() 2025-12-04T09:28:37.1399490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1399579Z fn() 2025-12-04T09:28:37.1400066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1400162Z method(*args, **kwargs) 2025-12-04T09:28:37.1400648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1400766Z method(*args, **kwargs) 2025-12-04T09:28:37.1401309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1401401Z with policy(): 2025-12-04T09:28:37.1401888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1402000Z raise RuntimeError(msg) 2025-12-04T09:28:37.1403446Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 724500480. 2025-12-04T09:28:37.1403453Z 2025-12-04T09:28:37.1403664Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1404684Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1404693Z 2025-12-04T09:28:37.1404956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1404961Z 2025-12-04T09:28:37.1404966Z 2025-12-04T09:28:37.1405174Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.1405421Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.1406330Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-3dd1dab0649736e8.xml - 2025-12-04T09:28:37.1406492Z =========================== short test summary info ============================ 2025-12-04T09:28:37.1407695Z FAILED [9.1817s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.1407810Z Traceback (most recent call last): 2025-12-04T09:28:37.1408370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1408477Z getattr(self, test_name)() 2025-12-04T09:28:37.1408993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1409080Z fn() 2025-12-04T09:28:37.1409565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1409662Z method(*args, **kwargs) 2025-12-04T09:28:37.1410152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1410254Z method(*args, **kwargs) 2025-12-04T09:28:37.1410739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1410840Z with policy(): 2025-12-04T09:28:37.1411331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1411436Z raise RuntimeError(msg) 2025-12-04T09:28:37.1412875Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 724500480. 2025-12-04T09:28:37.1412881Z 2025-12-04T09:28:37.1413085Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1414431Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1414440Z 2025-12-04T09:28:37.1414701Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1414883Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.1415054Z ======================= 1 failed, 7 deselected in 9.40s ======================== 2025-12-04T09:28:37.1415147Z Got exit code 1 2025-12-04T09:28:37.1415255Z Retrying single test... 2025-12-04T09:28:37.1416011Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-255818cdbe5fbd05.xml 2025-12-04T09:28:37.1416173Z ============================= test session starts ============================== 2025-12-04T09:28:37.1416519Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.1416618Z cachedir: .pytest_cache 2025-12-04T09:28:37.1417140Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.1417260Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.1417370Z configfile: pytest.ini 2025-12-04T09:28:37.1417905Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.1419151Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1419294Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.1420557Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1420747Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.1420890Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.1422024Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1423432Z Running 1 items in this shard 2025-12-04T09:28:37.1423650Z 2025-12-04T09:28:37.1425052Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:23:29.360000 41632 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41684 2025-12-04T09:28:37.1427136Z I1204 09:23:29.361000 41632 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41685 2025-12-04T09:28:37.1428227Z I1204 09:23:29.362000 41632 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41686 2025-12-04T09:28:37.1429294Z I1204 09:23:29.362000 41632 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41687 2025-12-04T09:28:37.1432221Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1434752Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1437298Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1439844Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1442332Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1444861Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1447347Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1449858Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1451732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1453935Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1455963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1457928Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1459893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1461855Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1463798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1465920Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1466671Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1467673Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1469250Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1470728Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1472211Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1473589Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1474937Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1476366Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1477792Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1479538Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1481041Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1482511Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1484050Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1485571Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1488137Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.1490497Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1491679Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1494035Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1495967Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1497114Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1498447Z E1204 09:23:36.644000 41685 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.1499507Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1500644Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1502252Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1503832Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1505394Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1506988Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1508403Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1509740Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1511310Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1512724Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1514360Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1515815Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1517249Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1518752Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1521172Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.1523534Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1524558Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1526663Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1528378Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1529397Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1530578Z E1204 09:23:36.645000 41687 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.1531587Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1532528Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1534218Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1535780Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1537354Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1538804Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1540238Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1541743Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1543258Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1544765Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1546467Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1547765Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1549102Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1550450Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1552664Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:28:37.1554764Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1555739Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1557719Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1559429Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1560452Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1561686Z E1204 09:23:36.646000 41686 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.1562629Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1563557Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1564968Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1566354Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1567743Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1569029Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1570299Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1571643Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1572989Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1574691Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1576208Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1577725Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1579407Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1580928Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1583430Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 632225792 and is now 720306176. 2025-12-04T09:28:37.1585794Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1586884Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1589118Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1591113Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1592223Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1593401Z E1204 09:23:36.651000 41684 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.1594060Z FAILED [9.2041s] [100%] 2025-12-04T09:28:37.1594212Z 2025-12-04T09:28:37.1594348Z =================================== FAILURES =================================== 2025-12-04T09:28:37.1595137Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.1595890Z Traceback (most recent call last): 2025-12-04T09:28:37.1596572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.1597265Z self._join_processes(fn) 2025-12-04T09:28:37.1597954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.1598711Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.1599479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.1600229Z raise RuntimeError(error) 2025-12-04T09:28:37.1600608Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.1601026Z Traceback (most recent call last): 2025-12-04T09:28:37.1601707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1602387Z getattr(self, test_name)() 2025-12-04T09:28:37.1603038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1603740Z fn() 2025-12-04T09:28:37.1604300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1604989Z method(*args, **kwargs) 2025-12-04T09:28:37.1605610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1606267Z method(*args, **kwargs) 2025-12-04T09:28:37.1606869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1607520Z with policy(): 2025-12-04T09:28:37.1608111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1608772Z raise RuntimeError(msg) 2025-12-04T09:28:37.1610259Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.1611692Z 2025-12-04T09:28:37.1611878Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1613112Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1614476Z 2025-12-04T09:28:37.1614747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1615141Z 2025-12-04T09:28:37.1615146Z 2025-12-04T09:28:37.1615372Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.1615975Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.1617363Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-255818cdbe5fbd05.xml - 2025-12-04T09:28:37.1618583Z =========================== short test summary info ============================ 2025-12-04T09:28:37.1620089Z FAILED [9.2041s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.1621518Z Traceback (most recent call last): 2025-12-04T09:28:37.1622288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1623068Z getattr(self, test_name)() 2025-12-04T09:28:37.1623811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1624552Z fn() 2025-12-04T09:28:37.1625189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1626038Z method(*args, **kwargs) 2025-12-04T09:28:37.1626644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1627295Z method(*args, **kwargs) 2025-12-04T09:28:37.1627909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1628556Z with policy(): 2025-12-04T09:28:37.1629134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1629791Z raise RuntimeError(msg) 2025-12-04T09:28:37.1631313Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.1632767Z 2025-12-04T09:28:37.1632960Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1634189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1635244Z 2025-12-04T09:28:37.1635477Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1635984Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.1636414Z ======================= 1 failed, 14 deselected in 9.42s ======================= 2025-12-04T09:28:37.1636762Z Got exit code 1 2025-12-04T09:28:37.1636991Z Retrying single test... 2025-12-04T09:28:37.1637818Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2340b7a625d10704.xml 2025-12-04T09:28:37.1638741Z ============================= test session starts ============================== 2025-12-04T09:28:37.1639297Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.1639807Z cachedir: .pytest_cache 2025-12-04T09:28:37.1640426Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.1641095Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.1641392Z configfile: pytest.ini 2025-12-04T09:28:37.1642023Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.1643852Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1645197Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.1646518Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1647849Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.1648231Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.1649465Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1650674Z Running 1 items in this shard 2025-12-04T09:28:37.1650863Z 2025-12-04T09:28:37.1652102Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:23:43.199000 41965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42017 2025-12-04T09:28:37.1654195Z I1204 09:23:43.200000 41965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42018 2025-12-04T09:28:37.1655319Z I1204 09:23:43.201000 41965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42019 2025-12-04T09:28:37.1656424Z I1204 09:23:43.202000 41965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42020 2025-12-04T09:28:37.1659494Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1662137Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1664707Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1667259Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1669535Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1671845Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1673532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1675275Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1677657Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:189: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1680452Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1682366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1684345Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1686294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1688251Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1690195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1692208Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1692815Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1694070Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1695684Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1697268Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1698845Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1700302Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1701725Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1703239Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1704760Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1706382Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1707925Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1709355Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1710787Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1712266Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1714698Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 726597632. 2025-12-04T09:28:37.1716988Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1718043Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1720212Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1722085Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1723236Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1724532Z E1204 09:23:50.477000 42017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.1725598Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1726628Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1728183Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1729706Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1731231Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1732639Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1734257Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1735772Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1737287Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1738806Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1740391Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1741863Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1743344Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1744968Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1747459Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.1749675Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1750712Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1752810Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1754675Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1755698Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1756921Z E1204 09:23:50.480000 42019 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.1757874Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1758812Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1760232Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1761626Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1763022Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1764309Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1765561Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1766907Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1768305Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1769647Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1771179Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1772557Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1774192Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1775714Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1778223Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 282001408 and is now 611254272. 2025-12-04T09:28:37.1780931Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1782021Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1784260Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1786744Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1787892Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1789234Z E1204 09:23:50.483000 42020 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.1790288Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.1791447Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.1792968Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1794450Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.1796097Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1797507Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.1798894Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1800369Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1802165Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1803638Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.1805113Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1806547Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.1808163Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1809537Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.1812009Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.1814679Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1815777Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1818057Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1820025Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.1821169Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1822496Z E1204 09:23:50.488000 42018 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.1823253Z FAILED [9.3968s] [100%] 2025-12-04T09:28:37.1823429Z 2025-12-04T09:28:37.1823581Z =================================== FAILURES =================================== 2025-12-04T09:28:37.1824469Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.1825330Z Traceback (most recent call last): 2025-12-04T09:28:37.1826300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.1826993Z self._join_processes(fn) 2025-12-04T09:28:37.1827684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.1828439Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.1829206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.1829964Z raise RuntimeError(error) 2025-12-04T09:28:37.1830344Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.1830774Z Traceback (most recent call last): 2025-12-04T09:28:37.1831511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1832197Z getattr(self, test_name)() 2025-12-04T09:28:37.1832843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1833515Z fn() 2025-12-04T09:28:37.1834073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1834721Z method(*args, **kwargs) 2025-12-04T09:28:37.1835333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1835982Z method(*args, **kwargs) 2025-12-04T09:28:37.1836588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1837230Z with policy(): 2025-12-04T09:28:37.1837819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1838480Z raise RuntimeError(msg) 2025-12-04T09:28:37.1839962Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.1841395Z 2025-12-04T09:28:37.1841583Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1842818Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1843893Z 2025-12-04T09:28:37.1844139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1844489Z 2025-12-04T09:28:37.1844665Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.1845021Z Traceback (most recent call last): 2025-12-04T09:28:37.1845706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1846402Z getattr(self, test_name)() 2025-12-04T09:28:37.1847048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1847720Z fn() 2025-12-04T09:28:37.1848282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1848947Z method(*args, **kwargs) 2025-12-04T09:28:37.1849558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1850221Z method(*args, **kwargs) 2025-12-04T09:28:37.1850836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1851487Z with policy(): 2025-12-04T09:28:37.1852078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1852740Z raise RuntimeError(msg) 2025-12-04T09:28:37.1854548Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 282001408 and is now 611254272. 2025-12-04T09:28:37.1856162Z 2025-12-04T09:28:37.1856372Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1857842Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1859030Z 2025-12-04T09:28:37.1859291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1859683Z 2025-12-04T09:28:37.1859694Z 2025-12-04T09:28:37.1859913Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.1860527Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.1861846Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2340b7a625d10704.xml - 2025-12-04T09:28:37.1863071Z =========================== short test summary info ============================ 2025-12-04T09:28:37.1864579Z FAILED [9.3968s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.1866107Z Traceback (most recent call last): 2025-12-04T09:28:37.1866785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1867484Z getattr(self, test_name)() 2025-12-04T09:28:37.1868141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1868812Z fn() 2025-12-04T09:28:37.1869363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1870064Z method(*args, **kwargs) 2025-12-04T09:28:37.1870684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1871338Z method(*args, **kwargs) 2025-12-04T09:28:37.1871977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1872628Z with policy(): 2025-12-04T09:28:37.1873222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1873882Z raise RuntimeError(msg) 2025-12-04T09:28:37.1875367Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.1876801Z 2025-12-04T09:28:37.1876986Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1878217Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1879691Z 2025-12-04T09:28:37.1879964Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1880360Z 2025-12-04T09:28:37.1880518Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.1880923Z Traceback (most recent call last): 2025-12-04T09:28:37.1881697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.1882474Z getattr(self, test_name)() 2025-12-04T09:28:37.1883216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.1883986Z fn() 2025-12-04T09:28:37.1884723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1885467Z method(*args, **kwargs) 2025-12-04T09:28:37.1886171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.1886915Z method(*args, **kwargs) 2025-12-04T09:28:37.1887601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.1888338Z with policy(): 2025-12-04T09:28:37.1889010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.1889766Z raise RuntimeError(msg) 2025-12-04T09:28:37.1891544Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 282001408 and is now 611254272. 2025-12-04T09:28:37.1892978Z 2025-12-04T09:28:37.1893165Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.1894739Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1895917Z 2025-12-04T09:28:37.1896186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.1896763Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.1897242Z ======================= 1 failed, 14 deselected in 9.61s ======================= 2025-12-04T09:28:37.1897691Z Got exit code 1 2025-12-04T09:28:37.1898823Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.1900364Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.1901656Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-6880e02fcbe22f17.xml 2025-12-04T09:28:37.1902700Z ============================= test session starts ============================== 2025-12-04T09:28:37.1903346Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.1903923Z cachedir: .pytest_cache 2025-12-04T09:28:37.1904622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.1905397Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.1905835Z configfile: pytest.ini 2025-12-04T09:28:37.1906583Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.1908286Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1909632Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.1910936Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.1912260Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.1912640Z collected 15 items / 8 deselected / 7 selected 2025-12-04T09:28:37.1912997Z stepcurrent: skipping 8 already run items. 2025-12-04T09:28:37.1913374Z Running 7 items in this shard 2025-12-04T09:28:37.1913558Z 2025-12-04T09:28:37.1914841Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:23:57.110000 42298 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42350 2025-12-04T09:28:37.1916672Z I1204 09:23:57.111000 42298 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42351 2025-12-04T09:28:37.1917665Z I1204 09:23:57.112000 42298 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42352 2025-12-04T09:28:37.1918653Z I1204 09:23:57.112000 42298 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42353 2025-12-04T09:28:37.1921345Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1923664Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1925948Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1928292Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1930559Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1933233Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1935963Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1938569Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1940486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1942462Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1944487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1946539Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1948429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1950331Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1952320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1954060Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.1956400Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1958978Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1961403Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1963916Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1966329Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1968871Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1971161Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.1973520Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.1975579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1977575Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.1979922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1981910Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.1983897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1985885Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.1987875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.1989853Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.1991118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.1992056Z local_shape = tensor.shape 2025-12-04T09:28:37.1992952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.1993916Z local_shape = tensor.shape 2025-12-04T09:28:37.1994806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.1995782Z local_shape = tensor.shape 2025-12-04T09:28:37.1996667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.1997593Z local_shape = tensor.shape 2025-12-04T09:28:37.1998474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.1999386Z tensor.shape, 2025-12-04T09:28:37.2000231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2001137Z tensor.shape, 2025-12-04T09:28:37.2001982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2002897Z tensor.shape, 2025-12-04T09:28:37.2003744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2004643Z tensor.dtype, 2025-12-04T09:28:37.2005490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2006405Z tensor.dtype, 2025-12-04T09:28:37.2007243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2008142Z tensor.dtype, 2025-12-04T09:28:37.2009034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2009951Z tensor.shape, 2025-12-04T09:28:37.2010795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2011704Z tensor.dtype, 2025-12-04T09:28:37.2012220Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2013165Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2014961Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2016542Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2018113Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2019562Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2020997Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2022515Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2024083Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2025630Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2027124Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2028443Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2029758Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2031117Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2033388Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 638517248 and is now 726597632. 2025-12-04T09:28:37.2035527Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2036501Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2038572Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2040337Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2041366Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2042550Z E1204 09:24:03.880000 42350 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.2043508Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2044443Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2045865Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2047260Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2048656Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2049944Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2051206Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2052587Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2054475Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2055982Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2057493Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2058971Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2060465Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2061997Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2064552Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.2066992Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2068075Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2070170Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2071932Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2072957Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2074134Z E1204 09:24:03.881000 42353 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.2075087Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2076030Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2077463Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2079164Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2080736Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2082248Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2083692Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2085247Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2086769Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2088289Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2089806Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2091386Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2092810Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2094460Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2097015Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:28:37.2097356Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2098084Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2099613Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2099948Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2100629Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2101157Z E1204 09:24:03.884000 42351 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.2101582Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2102090Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2103065Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2103536Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2104502Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2104899Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2106131Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2106535Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2107362Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2107764Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2108590Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2108970Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2109795Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2110205Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2111981Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2112287Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2112844Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2114198Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2114496Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2115101Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2115571Z E1204 09:24:03.887000 42352 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.2115663Z FAILED [8.8657s] [ 14%] 2025-12-04T09:28:37.2115670Z 2025-12-04T09:28:37.2115805Z =================================== FAILURES =================================== 2025-12-04T09:28:37.2116389Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.2116495Z Traceback (most recent call last): 2025-12-04T09:28:37.2116978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.2117101Z self._join_processes(fn) 2025-12-04T09:28:37.2117623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.2117750Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.2118276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.2118416Z raise RuntimeError(error) 2025-12-04T09:28:37.2118622Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2118725Z Traceback (most recent call last): 2025-12-04T09:28:37.2119206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2119301Z getattr(self, test_name)() 2025-12-04T09:28:37.2119773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2119854Z fn() 2025-12-04T09:28:37.2120297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2120396Z method(*args, **kwargs) 2025-12-04T09:28:37.2120837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2120933Z method(*args, **kwargs) 2025-12-04T09:28:37.2121385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2121469Z with policy(): 2025-12-04T09:28:37.2121923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2122015Z raise RuntimeError(msg) 2025-12-04T09:28:37.2123373Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 638517248 and is now 726597632. 2025-12-04T09:28:37.2123429Z 2025-12-04T09:28:37.2123628Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2124611Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2124617Z 2025-12-04T09:28:37.2124854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2124859Z 2025-12-04T09:28:37.2124999Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.2125113Z Traceback (most recent call last): 2025-12-04T09:28:37.2125591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2125690Z getattr(self, test_name)() 2025-12-04T09:28:37.2126173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2126253Z fn() 2025-12-04T09:28:37.2126693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2126791Z method(*args, **kwargs) 2025-12-04T09:28:37.2127233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2127329Z method(*args, **kwargs) 2025-12-04T09:28:37.2127772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2127855Z with policy(): 2025-12-04T09:28:37.2128315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2128436Z raise RuntimeError(msg) 2025-12-04T09:28:37.2129789Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.2129829Z 2025-12-04T09:28:37.2130019Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2130992Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2130997Z 2025-12-04T09:28:37.2131237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2131244Z 2025-12-04T09:28:37.2131248Z 2025-12-04T09:28:37.2131440Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.2131682Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.2132510Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-6880e02fcbe22f17.xml - 2025-12-04T09:28:37.2132666Z =========================== short test summary info ============================ 2025-12-04T09:28:37.2134035Z FAILED [8.8657s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2134156Z Traceback (most recent call last): 2025-12-04T09:28:37.2134714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2134823Z getattr(self, test_name)() 2025-12-04T09:28:37.2135427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2135531Z fn() 2025-12-04T09:28:37.2136034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2136147Z method(*args, **kwargs) 2025-12-04T09:28:37.2136647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2136747Z method(*args, **kwargs) 2025-12-04T09:28:37.2137258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2137354Z with policy(): 2025-12-04T09:28:37.2137864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2137969Z raise RuntimeError(msg) 2025-12-04T09:28:37.2139503Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 638517248 and is now 726597632. 2025-12-04T09:28:37.2139511Z 2025-12-04T09:28:37.2139734Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2140832Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2140866Z 2025-12-04T09:28:37.2141135Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2141140Z 2025-12-04T09:28:37.2141302Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.2141425Z Traceback (most recent call last): 2025-12-04T09:28:37.2141981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2142137Z getattr(self, test_name)() 2025-12-04T09:28:37.2142681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2142768Z fn() 2025-12-04T09:28:37.2143264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2143379Z method(*args, **kwargs) 2025-12-04T09:28:37.2143877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2143978Z method(*args, **kwargs) 2025-12-04T09:28:37.2144485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2144581Z with policy(): 2025-12-04T09:28:37.2145102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2145205Z raise RuntimeError(msg) 2025-12-04T09:28:37.2146740Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.2146754Z 2025-12-04T09:28:37.2146943Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2147913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2147967Z 2025-12-04T09:28:37.2148209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2148365Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.2148527Z ======================= 1 failed, 8 deselected in 9.08s ======================== 2025-12-04T09:28:37.2148612Z Got exit code 1 2025-12-04T09:28:37.2148702Z Retrying single test... 2025-12-04T09:28:37.2149385Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a497c1942163e16f.xml 2025-12-04T09:28:37.2149523Z ============================= test session starts ============================== 2025-12-04T09:28:37.2149829Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.2149932Z cachedir: .pytest_cache 2025-12-04T09:28:37.2150388Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.2150504Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.2150603Z configfile: pytest.ini 2025-12-04T09:28:37.2151074Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.2152192Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2152309Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.2153430Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2153568Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.2153725Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.2154769Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2154865Z Running 1 items in this shard 2025-12-04T09:28:37.2154870Z 2025-12-04T09:28:37.2156157Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:24:10.379000 42631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42683 2025-12-04T09:28:37.2156599Z I1204 09:24:10.380000 42631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42684 2025-12-04T09:28:37.2157042Z I1204 09:24:10.381000 42631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42685 2025-12-04T09:28:37.2157475Z I1204 09:24:10.382000 42631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42686 2025-12-04T09:28:37.2159604Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2159711Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2161851Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2161957Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2164064Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2164173Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2166264Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2166364Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2167897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2168043Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2169592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2169708Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2171216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2171332Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2172849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2172961Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2175575Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2175691Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2178079Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2178188Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2180738Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2180852Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2183241Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2183408Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2185123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2185327Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2187013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2187183Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2188870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2189037Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2190928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2191078Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2191792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2191961Z local_shape = tensor.shape 2025-12-04T09:28:37.2192680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2192767Z tensor.shape, 2025-12-04T09:28:37.2193482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2193565Z tensor.dtype, 2025-12-04T09:28:37.2194269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2194375Z local_shape = tensor.shape 2025-12-04T09:28:37.2195088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2195196Z local_shape = tensor.shape 2025-12-04T09:28:37.2195902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2195998Z local_shape = tensor.shape 2025-12-04T09:28:37.2196712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2196795Z tensor.shape, 2025-12-04T09:28:37.2197515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2197639Z tensor.shape, 2025-12-04T09:28:37.2198352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2198465Z tensor.dtype, 2025-12-04T09:28:37.2199173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2199266Z tensor.shape, 2025-12-04T09:28:37.2199975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2200059Z tensor.dtype, 2025-12-04T09:28:37.2200775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2200860Z tensor.dtype, 2025-12-04T09:28:37.2201240Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2201694Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2202557Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2202988Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2203832Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2204168Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2205040Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2205447Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2206280Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2206680Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2207508Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2207879Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2208710Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2209117Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2210859Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.2211194Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2211747Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2213123Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2213480Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2214326Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2214850Z E1204 09:24:17.130000 42684 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.2215273Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2215781Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2216750Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2217233Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2218196Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2218630Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2219561Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2220018Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2220949Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2221403Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2222342Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2222758Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2223685Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2224149Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2226182Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:37.2226538Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2227097Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2228454Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2228750Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2229370Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2229828Z E1204 09:24:17.132000 42683 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.2230200Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2230647Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2231502Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2231930Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2232829Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2233156Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2233986Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2234388Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2235218Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2235625Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2236451Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2236817Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2237645Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2238059Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2239813Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.2240142Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2240705Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2242064Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2242363Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2242975Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2243430Z E1204 09:24:17.132000 42686 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.2243809Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2244260Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2245118Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2245611Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2246459Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2246781Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2247608Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2248017Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2248846Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2249250Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2250078Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2250447Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2251279Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2251722Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2253521Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2254054Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2254680Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2256215Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2256551Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2257244Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2257757Z E1204 09:24:17.138000 42685 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.2257856Z FAILED [8.8406s] [100%] 2025-12-04T09:28:37.2257863Z 2025-12-04T09:28:37.2258016Z =================================== FAILURES =================================== 2025-12-04T09:28:37.2258676Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.2258860Z Traceback (most recent call last): 2025-12-04T09:28:37.2259405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.2259519Z self._join_processes(fn) 2025-12-04T09:28:37.2260112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.2260250Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.2260850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.2260969Z raise RuntimeError(error) 2025-12-04T09:28:37.2261202Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2261330Z Traceback (most recent call last): 2025-12-04T09:28:37.2261864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2261975Z getattr(self, test_name)() 2025-12-04T09:28:37.2262511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2262594Z fn() 2025-12-04T09:28:37.2263094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2263204Z method(*args, **kwargs) 2025-12-04T09:28:37.2263702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2263817Z method(*args, **kwargs) 2025-12-04T09:28:37.2264314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2264437Z with policy(): 2025-12-04T09:28:37.2264953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2265061Z raise RuntimeError(msg) 2025-12-04T09:28:37.2266764Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:37.2266770Z 2025-12-04T09:28:37.2266959Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2267938Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2267952Z 2025-12-04T09:28:37.2268184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2268189Z 2025-12-04T09:28:37.2268337Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.2268450Z Traceback (most recent call last): 2025-12-04T09:28:37.2268935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2269031Z getattr(self, test_name)() 2025-12-04T09:28:37.2269512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2269589Z fn() 2025-12-04T09:28:37.2270037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2270127Z method(*args, **kwargs) 2025-12-04T09:28:37.2270571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2270667Z method(*args, **kwargs) 2025-12-04T09:28:37.2271160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2271246Z with policy(): 2025-12-04T09:28:37.2271698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2271790Z raise RuntimeError(msg) 2025-12-04T09:28:37.2273163Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.2273171Z 2025-12-04T09:28:37.2273361Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2274350Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2274357Z 2025-12-04T09:28:37.2274586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2274591Z 2025-12-04T09:28:37.2274595Z 2025-12-04T09:28:37.2274786Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.2275025Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.2275905Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a497c1942163e16f.xml - 2025-12-04T09:28:37.2287160Z =========================== short test summary info ============================ 2025-12-04T09:28:37.2288516Z FAILED [8.8406s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2288758Z Traceback (most recent call last): 2025-12-04T09:28:37.2289331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2289456Z getattr(self, test_name)() 2025-12-04T09:28:37.2290023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2290116Z fn() 2025-12-04T09:28:37.2290628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2290745Z method(*args, **kwargs) 2025-12-04T09:28:37.2291355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2291460Z method(*args, **kwargs) 2025-12-04T09:28:37.2291964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2292059Z with policy(): 2025-12-04T09:28:37.2292558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2292664Z raise RuntimeError(msg) 2025-12-04T09:28:37.2294470Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:37.2294483Z 2025-12-04T09:28:37.2294710Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2295915Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2295925Z 2025-12-04T09:28:37.2296205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2296211Z 2025-12-04T09:28:37.2296374Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.2296512Z Traceback (most recent call last): 2025-12-04T09:28:37.2297061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2297173Z getattr(self, test_name)() 2025-12-04T09:28:37.2297718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2297811Z fn() 2025-12-04T09:28:37.2298323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2298440Z method(*args, **kwargs) 2025-12-04T09:28:37.2298944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2299059Z method(*args, **kwargs) 2025-12-04T09:28:37.2299561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2299660Z with policy(): 2025-12-04T09:28:37.2300179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2300290Z raise RuntimeError(msg) 2025-12-04T09:28:37.2301837Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:28:37.2301926Z 2025-12-04T09:28:37.2302149Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2303256Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2303263Z 2025-12-04T09:28:37.2303536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2303718Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.2303900Z ======================= 1 failed, 14 deselected in 9.05s ======================= 2025-12-04T09:28:37.2304001Z Got exit code 1 2025-12-04T09:28:37.2304106Z Retrying single test... 2025-12-04T09:28:37.2304884Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a4ee3bf5f7a9a01f.xml 2025-12-04T09:28:37.2305050Z ============================= test session starts ============================== 2025-12-04T09:28:37.2305409Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.2305629Z cachedir: .pytest_cache 2025-12-04T09:28:37.2306173Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.2306374Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.2306514Z configfile: pytest.ini 2025-12-04T09:28:37.2311119Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.2312816Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2312970Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.2314145Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2314294Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.2314442Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.2315574Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2315689Z Running 1 items in this shard 2025-12-04T09:28:37.2315695Z 2025-12-04T09:28:37.2317113Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda I1204 09:24:23.689000 42964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43016 2025-12-04T09:28:37.2317607Z I1204 09:24:23.690000 42964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43017 2025-12-04T09:28:37.2318081Z I1204 09:24:23.691000 42964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43018 2025-12-04T09:28:37.2318552Z I1204 09:24:23.692000 42964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43019 2025-12-04T09:28:37.2320979Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2321123Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2323425Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2323535Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2325903Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2326015Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2328326Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2328499Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2330175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2330294Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2332030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2332157Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2334030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2334163Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2335878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2336047Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2338439Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2338587Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2340971Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2341088Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2343448Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2343559Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2346191Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2346295Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2347793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2347933Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2349436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2349577Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2351077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2351213Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2352742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2352901Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2353615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2353718Z local_shape = tensor.shape 2025-12-04T09:28:37.2354426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2354516Z tensor.shape, 2025-12-04T09:28:37.2355229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2355311Z tensor.dtype, 2025-12-04T09:28:37.2356020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2356123Z local_shape = tensor.shape 2025-12-04T09:28:37.2356830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2356935Z local_shape = tensor.shape 2025-12-04T09:28:37.2357643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2357729Z tensor.shape, 2025-12-04T09:28:37.2358444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2358577Z tensor.shape, 2025-12-04T09:28:37.2359284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2359376Z tensor.dtype, 2025-12-04T09:28:37.2360082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2360176Z tensor.dtype, 2025-12-04T09:28:37.2360885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2360980Z local_shape = tensor.shape 2025-12-04T09:28:37.2361696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2361783Z tensor.shape, 2025-12-04T09:28:37.2362493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2362575Z tensor.dtype, 2025-12-04T09:28:37.2362959Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2363412Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2364270Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2364732Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2365591Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2365944Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2366778Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2367181Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2368016Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2368419Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2369253Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2369621Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2370444Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2370863Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2372653Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.2372963Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2373575Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2375276Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2375616Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2376313Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2376827Z E1204 09:24:30.370000 43019 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.2377250Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2377761Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2378959Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2379455Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2380478Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2380843Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2381790Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2382247Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2383190Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2383643Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2384577Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2384994Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2385934Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2386470Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2388448Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.2388779Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2389416Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2391118Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2391422Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2392027Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2392480Z E1204 09:24:30.372000 43018 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.2392861Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2393339Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2394208Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2394657Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2395502Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2395838Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2396664Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2397080Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2397900Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2398316Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2399142Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2399507Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2400412Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2400819Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2402557Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:28:37.2402856Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2403422Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2404776Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2405087Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2405694Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2406151Z E1204 09:24:30.374000 43016 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.2406564Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2407014Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2408138Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2408583Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2409478Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2409829Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2410699Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2411142Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2412017Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2412447Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2413381Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2414015Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2414967Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2415426Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2417397Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.2417736Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2418376Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2419904Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2420246Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2420938Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2421489Z E1204 09:24:30.376000 43017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.2421624Z FAILED [8.5901s] [100%] 2025-12-04T09:28:37.2421631Z 2025-12-04T09:28:37.2421777Z =================================== FAILURES =================================== 2025-12-04T09:28:37.2422438Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.2422556Z Traceback (most recent call last): 2025-12-04T09:28:37.2423099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.2423217Z self._join_processes(fn) 2025-12-04T09:28:37.2423796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.2423942Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.2424552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.2424666Z raise RuntimeError(error) 2025-12-04T09:28:37.2424904Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.2425020Z Traceback (most recent call last): 2025-12-04T09:28:37.2425662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2425778Z getattr(self, test_name)() 2025-12-04T09:28:37.2426370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2426452Z fn() 2025-12-04T09:28:37.2426902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2426997Z method(*args, **kwargs) 2025-12-04T09:28:37.2427499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2427590Z method(*args, **kwargs) 2025-12-04T09:28:37.2428029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2428125Z with policy(): 2025-12-04T09:28:37.2428570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2428672Z raise RuntimeError(msg) 2025-12-04T09:28:37.2430024Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.2430032Z 2025-12-04T09:28:37.2430219Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2431200Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2431207Z 2025-12-04T09:28:37.2431442Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2431448Z 2025-12-04T09:28:37.2431452Z 2025-12-04T09:28:37.2431653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.2431882Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.2432718Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a4ee3bf5f7a9a01f.xml - 2025-12-04T09:28:37.2432894Z =========================== short test summary info ============================ 2025-12-04T09:28:37.2434007Z FAILED [8.5901s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.2434149Z Traceback (most recent call last): 2025-12-04T09:28:37.2434634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2434739Z getattr(self, test_name)() 2025-12-04T09:28:37.2435207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2435285Z fn() 2025-12-04T09:28:37.2435738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2435832Z method(*args, **kwargs) 2025-12-04T09:28:37.2436276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2436371Z method(*args, **kwargs) 2025-12-04T09:28:37.2436810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2436908Z with policy(): 2025-12-04T09:28:37.2437351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2437446Z raise RuntimeError(msg) 2025-12-04T09:28:37.2438809Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.2438816Z 2025-12-04T09:28:37.2439056Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2440045Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2440051Z 2025-12-04T09:28:37.2440284Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2440451Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.2440599Z ======================= 1 failed, 14 deselected in 8.81s ======================= 2025-12-04T09:28:37.2440683Z Got exit code 1 2025-12-04T09:28:37.2441584Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2441944Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.2442619Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d9ebf91db9daa02.xml 2025-12-04T09:28:37.2442765Z ============================= test session starts ============================== 2025-12-04T09:28:37.2443070Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.2443170Z cachedir: .pytest_cache 2025-12-04T09:28:37.2443625Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.2443781Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.2443878Z configfile: pytest.ini 2025-12-04T09:28:37.2444350Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.2445473Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2445628Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.2446710Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2446856Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.2446984Z collected 15 items / 9 deselected / 6 selected 2025-12-04T09:28:37.2447116Z stepcurrent: skipping 9 already run items. 2025-12-04T09:28:37.2447213Z Running 6 items in this shard 2025-12-04T09:28:37.2447219Z 2025-12-04T09:28:37.2448497Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:24:36.989000 43297 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43349 2025-12-04T09:28:37.2448946Z I1204 09:24:36.990000 43297 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43350 2025-12-04T09:28:37.2449384Z I1204 09:24:36.991000 43297 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43351 2025-12-04T09:28:37.2449822Z I1204 09:24:36.992000 43297 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43352 2025-12-04T09:28:37.2451998Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2452112Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2454532Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2454657Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2457042Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2457159Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2459524Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2459669Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2461437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2461563Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2463286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2463411Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2465130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2465252Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2466948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2467057Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2469220Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2469317Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2471410Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2471510Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2473597Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2473697Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2475794Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2475948Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2477456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2477608Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2479562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2479732Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2481410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2481567Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2483359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2483516Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2484329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2484439Z local_shape = tensor.shape 2025-12-04T09:28:37.2485258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2485362Z local_shape = tensor.shape 2025-12-04T09:28:37.2486161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2486279Z local_shape = tensor.shape 2025-12-04T09:28:37.2487074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2487177Z tensor.shape, 2025-12-04T09:28:37.2487974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2488065Z tensor.dtype, 2025-12-04T09:28:37.2488869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2489000Z tensor.shape, 2025-12-04T09:28:37.2489797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2489900Z tensor.shape, 2025-12-04T09:28:37.2490692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2490844Z tensor.dtype, 2025-12-04T09:28:37.2491824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2491907Z tensor.dtype, 2025-12-04T09:28:37.2492624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2492719Z local_shape = tensor.shape 2025-12-04T09:28:37.2493487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2493576Z tensor.shape, 2025-12-04T09:28:37.2494539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2494640Z tensor.dtype, 2025-12-04T09:28:37.2495071Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2495585Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2496556Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2497034Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2498067Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2498436Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2499374Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2499830Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2500770Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2501224Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2502153Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2502571Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2503493Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2504008Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2506062Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 640614400 and is now 726597632. 2025-12-04T09:28:37.2506414Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2506965Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2508308Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2508612Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2509226Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2509691Z E1204 09:24:43.762000 43349 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.2510062Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2510514Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2511371Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2511842Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2512706Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2513031Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2513863Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2514267Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2515099Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2515503Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2516316Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2516691Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2517518Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2517960Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2519725Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 523173888 and is now 615448576. 2025-12-04T09:28:37.2520026Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2520584Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2521936Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2522241Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2522845Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2523310Z E1204 09:24:43.765000 43351 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.2523682Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2524137Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2525047Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2525471Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2526329Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2526653Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2527483Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2527887Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2528709Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2529115Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2529939Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2530335Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2531160Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2531597Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2533395Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 3. CUDA driver allocated memory was 313458688 and is now 613351424. 2025-12-04T09:28:37.2533884Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2534512Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2536036Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2536376Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2537061Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2537581Z E1204 09:24:43.767000 43352 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.2538004Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2538634Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2539610Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2540081Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2541039Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2541405Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2542352Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2542806Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2543743Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2544203Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2545134Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2545585Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2546571Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2546987Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2548723Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2549021Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2549582Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2550925Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2551224Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2551827Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2552349Z E1204 09:24:43.767000 43350 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.2552440Z FAILED [8.6040s] [ 16%] 2025-12-04T09:28:37.2552445Z 2025-12-04T09:28:37.2552571Z =================================== FAILURES =================================== 2025-12-04T09:28:37.2553155Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.2553259Z Traceback (most recent call last): 2025-12-04T09:28:37.2553744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.2553840Z self._join_processes(fn) 2025-12-04T09:28:37.2554352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.2554489Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.2555020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.2555123Z raise RuntimeError(error) 2025-12-04T09:28:37.2555337Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.2555444Z Traceback (most recent call last): 2025-12-04T09:28:37.2555921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2556016Z getattr(self, test_name)() 2025-12-04T09:28:37.2556486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2556569Z fn() 2025-12-04T09:28:37.2557014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2557130Z method(*args, **kwargs) 2025-12-04T09:28:37.2557580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2557697Z method(*args, **kwargs) 2025-12-04T09:28:37.2558146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2558230Z with policy(): 2025-12-04T09:28:37.2558676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2558777Z raise RuntimeError(msg) 2025-12-04T09:28:37.2560137Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2560145Z 2025-12-04T09:28:37.2560341Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2561319Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2561326Z 2025-12-04T09:28:37.2561562Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2561566Z 2025-12-04T09:28:37.2561570Z 2025-12-04T09:28:37.2561758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.2561987Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.2562817Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d9ebf91db9daa02.xml - 2025-12-04T09:28:37.2562963Z =========================== short test summary info ============================ 2025-12-04T09:28:37.2564124Z FAILED [8.6040s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.2564232Z Traceback (most recent call last): 2025-12-04T09:28:37.2564718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2564820Z getattr(self, test_name)() 2025-12-04T09:28:37.2565295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2565377Z fn() 2025-12-04T09:28:37.2565819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2565908Z method(*args, **kwargs) 2025-12-04T09:28:37.2566361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2566453Z method(*args, **kwargs) 2025-12-04T09:28:37.2566896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2566986Z with policy(): 2025-12-04T09:28:37.2567430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2567532Z raise RuntimeError(msg) 2025-12-04T09:28:37.2568887Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2568919Z 2025-12-04T09:28:37.2569118Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2570109Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2570115Z 2025-12-04T09:28:37.2570347Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2570509Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.2570660Z ======================= 1 failed, 9 deselected in 8.82s ======================== 2025-12-04T09:28:37.2570750Z Got exit code 1 2025-12-04T09:28:37.2570841Z Retrying single test... 2025-12-04T09:28:37.2571511Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-04afe2c287023adc.xml 2025-12-04T09:28:37.2571662Z ============================= test session starts ============================== 2025-12-04T09:28:37.2571967Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.2572059Z cachedir: .pytest_cache 2025-12-04T09:28:37.2572521Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.2572627Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.2572726Z configfile: pytest.ini 2025-12-04T09:28:37.2573266Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.2574641Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2574844Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.2576062Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2576223Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.2576369Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.2577539Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2577657Z Running 1 items in this shard 2025-12-04T09:28:37.2577663Z 2025-12-04T09:28:37.2579305Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:24:50.330000 43630 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43682 2025-12-04T09:28:37.2579813Z I1204 09:24:50.331000 43630 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43683 2025-12-04T09:28:37.2580299Z I1204 09:24:50.331000 43630 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43684 2025-12-04T09:28:37.2580793Z I1204 09:24:50.332000 43630 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43685 2025-12-04T09:28:37.2583195Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2583423Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2585796Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2585906Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2588288Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2588397Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2590978Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2591075Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2592673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2592787Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2594311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2594425Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2595938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2596047Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2597553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2597691Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2599802Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2599932Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2602034Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2602136Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2604236Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2604337Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2606512Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2606620Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2608120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2608271Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2609772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2609916Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2611418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2611558Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2613060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2613277Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2614261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2614374Z local_shape = tensor.shape 2025-12-04T09:28:37.2615173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2615276Z tensor.shape, 2025-12-04T09:28:37.2616080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2616187Z tensor.dtype, 2025-12-04T09:28:37.2616990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2617101Z local_shape = tensor.shape 2025-12-04T09:28:37.2617905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2618013Z local_shape = tensor.shape 2025-12-04T09:28:37.2618822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2618916Z tensor.shape, 2025-12-04T09:28:37.2619713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2619828Z local_shape = tensor.shape 2025-12-04T09:28:37.2620676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2620781Z tensor.shape, 2025-12-04T09:28:37.2621575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2621664Z tensor.dtype, 2025-12-04T09:28:37.2622472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2622565Z tensor.dtype, 2025-12-04T09:28:37.2623369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2623463Z tensor.shape, 2025-12-04T09:28:37.2624264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2624366Z tensor.dtype, 2025-12-04T09:28:37.2624795Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2625300Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2626434Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2626854Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2627741Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2628093Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2628920Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2629327Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2630150Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2630560Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2631382Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2631756Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2632571Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2632984Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2634811Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 724500480. 2025-12-04T09:28:37.2635108Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2635662Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2637009Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2637312Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2637923Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2638392Z E1204 09:24:57.083000 43682 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.2638765Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2639211Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2640066Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2640522Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2641377Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2641723Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2642555Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2642957Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2643778Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2644190Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2645009Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2645380Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2646202Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2646615Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2648396Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 3. CUDA driver allocated memory was 395247616 and is now 617545728. 2025-12-04T09:28:37.2648694Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2649258Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2650609Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2650913Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2651518Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2651977Z E1204 09:24:57.084000 43685 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.2652349Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2652793Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2653944Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2654467Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2655427Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2655789Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2656729Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2657186Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2658116Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2658581Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2659507Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2659928Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2660858Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2661369Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2663325Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:28:37.2663655Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2664288Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2665809Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2666228Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2666835Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2667295Z E1204 09:24:57.086000 43684 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.2667692Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2668138Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2669003Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2669453Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2670304Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2670626Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2671458Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2671865Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2672690Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2673099Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2673913Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2674288Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2675229Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2675637Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2677375Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2677674Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2678234Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2679977Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2680315Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2680996Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2681583Z E1204 09:24:57.092000 43683 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.2681680Z FAILED [8.5871s] [100%] 2025-12-04T09:28:37.2681687Z 2025-12-04T09:28:37.2681839Z =================================== FAILURES =================================== 2025-12-04T09:28:37.2682528Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.2682648Z Traceback (most recent call last): 2025-12-04T09:28:37.2683189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.2683304Z self._join_processes(fn) 2025-12-04T09:28:37.2683881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.2684027Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.2684628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.2684736Z raise RuntimeError(error) 2025-12-04T09:28:37.2684978Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2685096Z Traceback (most recent call last): 2025-12-04T09:28:37.2685638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2685743Z getattr(self, test_name)() 2025-12-04T09:28:37.2686269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2686361Z fn() 2025-12-04T09:28:37.2686861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2686963Z method(*args, **kwargs) 2025-12-04T09:28:37.2687468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2687567Z method(*args, **kwargs) 2025-12-04T09:28:37.2688144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2688242Z with policy(): 2025-12-04T09:28:37.2688745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2688860Z raise RuntimeError(msg) 2025-12-04T09:28:37.2690391Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 724500480. 2025-12-04T09:28:37.2690400Z 2025-12-04T09:28:37.2690620Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2691884Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2691891Z 2025-12-04T09:28:37.2692127Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2692137Z 2025-12-04T09:28:37.2692141Z 2025-12-04T09:28:37.2692330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.2692559Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.2693439Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-04afe2c287023adc.xml - 2025-12-04T09:28:37.2693787Z =========================== short test summary info ============================ 2025-12-04T09:28:37.2695029Z FAILED [8.5871s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2695176Z Traceback (most recent call last): 2025-12-04T09:28:37.2695721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2695833Z getattr(self, test_name)() 2025-12-04T09:28:37.2696363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2696449Z fn() 2025-12-04T09:28:37.2696955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2697059Z method(*args, **kwargs) 2025-12-04T09:28:37.2697563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2697667Z method(*args, **kwargs) 2025-12-04T09:28:37.2698166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2698265Z with policy(): 2025-12-04T09:28:37.2698768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2698881Z raise RuntimeError(msg) 2025-12-04T09:28:37.2700406Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 724500480. 2025-12-04T09:28:37.2700415Z 2025-12-04T09:28:37.2700627Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2701773Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2701783Z 2025-12-04T09:28:37.2702042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2702223Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.2702391Z ======================= 1 failed, 14 deselected in 8.80s ======================= 2025-12-04T09:28:37.2702484Z Got exit code 1 2025-12-04T09:28:37.2702593Z Retrying single test... 2025-12-04T09:28:37.2703352Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-80cc64b9f2eb85b8.xml 2025-12-04T09:28:37.2703517Z ============================= test session starts ============================== 2025-12-04T09:28:37.2703864Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.2703969Z cachedir: .pytest_cache 2025-12-04T09:28:37.2704488Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.2704606Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.2704706Z configfile: pytest.ini 2025-12-04T09:28:37.2705253Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.2706651Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2706817Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.2707901Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2708076Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.2708207Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.2709234Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2709338Z Running 1 items in this shard 2025-12-04T09:28:37.2709343Z 2025-12-04T09:28:37.2710622Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda I1204 09:25:03.670000 43963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44015 2025-12-04T09:28:37.2711068Z I1204 09:25:03.671000 43963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44016 2025-12-04T09:28:37.2711502Z I1204 09:25:03.671000 43963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44017 2025-12-04T09:28:37.2711930Z I1204 09:25:03.672000 43963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44018 2025-12-04T09:28:37.2714052Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2714204Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2716325Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2716424Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2718741Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2718844Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2721075Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2721208Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2722839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2722983Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2724600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2724718Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2726332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2726619Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2728272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2728398Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2730842Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2730950Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2733240Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2733357Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2735889Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2736004Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2738372Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2738522Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2740226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2740415Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2742106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2742267Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2743970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2744126Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2745934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2746186Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2746957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2747057Z local_shape = tensor.shape 2025-12-04T09:28:37.2747770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2747871Z local_shape = tensor.shape 2025-12-04T09:28:37.2748574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2748678Z local_shape = tensor.shape 2025-12-04T09:28:37.2749388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2749482Z local_shape = tensor.shape 2025-12-04T09:28:37.2750397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2750489Z tensor.shape, 2025-12-04T09:28:37.2751244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2751332Z tensor.shape, 2025-12-04T09:28:37.2752084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2752205Z tensor.shape, 2025-12-04T09:28:37.2752952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2753040Z tensor.dtype, 2025-12-04T09:28:37.2753797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2753913Z tensor.dtype, 2025-12-04T09:28:37.2754665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2754752Z tensor.shape, 2025-12-04T09:28:37.2755498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2755593Z tensor.dtype, 2025-12-04T09:28:37.2756341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2756438Z tensor.dtype, 2025-12-04T09:28:37.2756844Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2757317Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2758409Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2758878Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2759813Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2760304Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2761209Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2761726Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2762624Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2763071Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2763971Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2764387Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2765293Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2765738Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2767651Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 523173888 and is now 617545728. 2025-12-04T09:28:37.2768032Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2768648Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2770117Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2770455Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2771121Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2771629Z E1204 09:25:10.405000 44016 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.2772033Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2772515Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2773536Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2774185Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2775217Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2775586Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2776520Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2776994Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2777917Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2778386Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2779495Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2779918Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2780849Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2781307Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2783336Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.2783703Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2784335Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2785864Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2786208Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2786895Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2787416Z E1204 09:25:10.406000 44015 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.2787836Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2788338Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2789308Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2789854Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2790915Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2791258Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2792134Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2792566Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2793440Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2793876Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2794748Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2795146Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2796018Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2796480Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2798334Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2798673Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2799375Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2800732Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2801036Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2801645Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2802096Z E1204 09:25:10.407000 44018 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.2802475Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2802922Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2803834Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2804253Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2805104Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2805425Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2806242Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2806656Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2807482Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2807891Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2808710Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2809078Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2809938Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2810369Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2812126Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2812423Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2812987Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2814665Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2815004Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2815690Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2816203Z E1204 09:25:10.407000 44017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.2816313Z FAILED [8.6312s] [100%] 2025-12-04T09:28:37.2816319Z 2025-12-04T09:28:37.2816462Z =================================== FAILURES =================================== 2025-12-04T09:28:37.2817185Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.2817305Z Traceback (most recent call last): 2025-12-04T09:28:37.2817842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.2817961Z self._join_processes(fn) 2025-12-04T09:28:37.2818544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.2818691Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.2819292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.2819408Z raise RuntimeError(error) 2025-12-04T09:28:37.2819645Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.2819767Z Traceback (most recent call last): 2025-12-04T09:28:37.2820305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2820423Z getattr(self, test_name)() 2025-12-04T09:28:37.2820951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2821047Z fn() 2025-12-04T09:28:37.2821547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2821646Z method(*args, **kwargs) 2025-12-04T09:28:37.2822152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2822282Z method(*args, **kwargs) 2025-12-04T09:28:37.2822779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2822885Z with policy(): 2025-12-04T09:28:37.2823415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2823528Z raise RuntimeError(msg) 2025-12-04T09:28:37.2825064Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2825070Z 2025-12-04T09:28:37.2825290Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2826529Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2826538Z 2025-12-04T09:28:37.2826771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2826776Z 2025-12-04T09:28:37.2826780Z 2025-12-04T09:28:37.2826980Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.2827210Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.2828037Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-80cc64b9f2eb85b8.xml - 2025-12-04T09:28:37.2828188Z =========================== short test summary info ============================ 2025-12-04T09:28:37.2829345Z FAILED [8.6312s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.2829459Z Traceback (most recent call last): 2025-12-04T09:28:37.2829951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2830053Z getattr(self, test_name)() 2025-12-04T09:28:37.2830526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2830601Z fn() 2025-12-04T09:28:37.2831054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2831144Z method(*args, **kwargs) 2025-12-04T09:28:37.2831591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2831683Z method(*args, **kwargs) 2025-12-04T09:28:37.2832128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2832222Z with policy(): 2025-12-04T09:28:37.2832666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2832760Z raise RuntimeError(msg) 2025-12-04T09:28:37.2834125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2834130Z 2025-12-04T09:28:37.2834349Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2835333Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2835365Z 2025-12-04T09:28:37.2835598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2835768Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.2835918Z ======================= 1 failed, 14 deselected in 8.84s ======================= 2025-12-04T09:28:37.2836003Z Got exit code 1 2025-12-04T09:28:37.2836906Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.2837262Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.2837947Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c38510c20e07f456.xml 2025-12-04T09:28:37.2838086Z ============================= test session starts ============================== 2025-12-04T09:28:37.2838390Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.2838487Z cachedir: .pytest_cache 2025-12-04T09:28:37.2838943Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.2839057Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.2839149Z configfile: pytest.ini 2025-12-04T09:28:37.2839620Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.2840791Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2840910Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.2841999Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2842136Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.2842263Z collected 15 items / 10 deselected / 5 selected 2025-12-04T09:28:37.2842392Z stepcurrent: skipping 10 already run items. 2025-12-04T09:28:37.2842487Z Running 5 items in this shard 2025-12-04T09:28:37.2842492Z 2025-12-04T09:28:37.2843776Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:25:16.999000 44296 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44348 2025-12-04T09:28:37.2844221Z I1204 09:25:17.000000 44296 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44349 2025-12-04T09:28:37.2844655Z I1204 09:25:17.001000 44296 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44350 2025-12-04T09:28:37.2845091Z I1204 09:25:17.002000 44296 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44351 2025-12-04T09:28:37.2847221Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2847362Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2849481Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2849581Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2851686Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2851790Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2854149Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2854269Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2856048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2856178Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2857901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2858021Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2859734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2859856Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2861566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2861685Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.2864076Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2864252Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2866791Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2866894Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2869119Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2869229Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2871436Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2871591Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2873097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2873250Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2874748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2874899Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2876406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2876554Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2878054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.2878218Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.2879271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2879595Z local_shape = tensor.shape 2025-12-04T09:28:37.2880404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2880513Z local_shape = tensor.shape 2025-12-04T09:28:37.2881317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2881423Z tensor.shape, 2025-12-04T09:28:37.2882224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2882328Z tensor.dtype, 2025-12-04T09:28:37.2883126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2883221Z tensor.shape, 2025-12-04T09:28:37.2884039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2884147Z local_shape = tensor.shape 2025-12-04T09:28:37.2884954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2885053Z tensor.dtype, 2025-12-04T09:28:37.2885856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2885954Z tensor.shape, 2025-12-04T09:28:37.2886838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2886941Z tensor.dtype, 2025-12-04T09:28:37.2887746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2887857Z local_shape = tensor.shape 2025-12-04T09:28:37.2888661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2888755Z tensor.shape, 2025-12-04T09:28:37.2889552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.2889654Z tensor.dtype, 2025-12-04T09:28:37.2890082Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2890595Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2891753Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2892184Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2893035Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2893476Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2894617Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2895074Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2896013Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2896470Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2897398Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2897827Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2898753Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2899222Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2901236Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 518979584 and is now 615448576. 2025-12-04T09:28:37.2901587Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2902210Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2903734Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2904070Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2904761Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2905282Z E1204 09:25:23.800000 44350 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.2905805Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2906369Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2907237Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2907683Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2908536Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2908891Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2909717Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2910121Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2910949Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2911359Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2912180Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2912558Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2913381Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2913792Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2915577Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 632225792 and is now 724500480. 2025-12-04T09:28:37.2915879Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2916442Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2917802Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2918100Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2918708Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2919173Z E1204 09:25:23.800000 44348 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.2919545Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2919991Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2920849Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2921313Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2922189Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2922513Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2923340Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2923741Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2924573Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2924977Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2925797Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2926169Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2926994Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2927412Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2929181Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.2929483Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2930038Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2931401Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2931695Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2932305Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2932771Z E1204 09:25:23.800000 44349 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.2933142Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.2933859Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.2934833Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2935345Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.2936312Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2936678Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.2937612Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2938076Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2939006Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2939461Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.2940387Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2940808Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.2941790Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2942256Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.2944209Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.2944545Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2945171Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2946803Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2947105Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.2947710Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2948171Z E1204 09:25:23.809000 44351 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.2948291Z FAILED [8.9284s] [ 20%] 2025-12-04T09:28:37.2948298Z 2025-12-04T09:28:37.2948437Z =================================== FAILURES =================================== 2025-12-04T09:28:37.2949010Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.2949142Z Traceback (most recent call last): 2025-12-04T09:28:37.2949634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.2949730Z self._join_processes(fn) 2025-12-04T09:28:37.2950238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.2950372Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.2950902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.2951015Z raise RuntimeError(error) 2025-12-04T09:28:37.2951218Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2951326Z Traceback (most recent call last): 2025-12-04T09:28:37.2951813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2951912Z getattr(self, test_name)() 2025-12-04T09:28:37.2952383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2952466Z fn() 2025-12-04T09:28:37.2952908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2953175Z method(*args, **kwargs) 2025-12-04T09:28:37.2953644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2953745Z method(*args, **kwargs) 2025-12-04T09:28:37.2954221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2954364Z with policy(): 2025-12-04T09:28:37.2954851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2954949Z raise RuntimeError(msg) 2025-12-04T09:28:37.2956557Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 632225792 and is now 724500480. 2025-12-04T09:28:37.2956563Z 2025-12-04T09:28:37.2956799Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2957890Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2957901Z 2025-12-04T09:28:37.2958169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2958177Z 2025-12-04T09:28:37.2958333Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.2958449Z Traceback (most recent call last): 2025-12-04T09:28:37.2958988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2959091Z getattr(self, test_name)() 2025-12-04T09:28:37.2959616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2959700Z fn() 2025-12-04T09:28:37.2960183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2960320Z method(*args, **kwargs) 2025-12-04T09:28:37.2960803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2960938Z method(*args, **kwargs) 2025-12-04T09:28:37.2961417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2961508Z with policy(): 2025-12-04T09:28:37.2962003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2962133Z raise RuntimeError(msg) 2025-12-04T09:28:37.2963644Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 518979584 and is now 615448576. 2025-12-04T09:28:37.2963652Z 2025-12-04T09:28:37.2963858Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2964925Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2964940Z 2025-12-04T09:28:37.2965191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2965197Z 2025-12-04T09:28:37.2965201Z 2025-12-04T09:28:37.2965409Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.2965671Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.2966598Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c38510c20e07f456.xml - 2025-12-04T09:28:37.2966769Z =========================== short test summary info ============================ 2025-12-04T09:28:37.2968037Z FAILED [8.9284s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.2968155Z Traceback (most recent call last): 2025-12-04T09:28:37.2968691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2968798Z getattr(self, test_name)() 2025-12-04T09:28:37.2969320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2969404Z fn() 2025-12-04T09:28:37.2969992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2970095Z method(*args, **kwargs) 2025-12-04T09:28:37.2970568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2970662Z method(*args, **kwargs) 2025-12-04T09:28:37.2971136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2971226Z with policy(): 2025-12-04T09:28:37.2971711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2971810Z raise RuntimeError(msg) 2025-12-04T09:28:37.2973304Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 632225792 and is now 724500480. 2025-12-04T09:28:37.2973357Z 2025-12-04T09:28:37.2973561Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2974921Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2974927Z 2025-12-04T09:28:37.2975219Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2975225Z 2025-12-04T09:28:37.2975388Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.2975516Z Traceback (most recent call last): 2025-12-04T09:28:37.2976061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.2976174Z getattr(self, test_name)() 2025-12-04T09:28:37.2976716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.2976807Z fn() 2025-12-04T09:28:37.2977308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2977418Z method(*args, **kwargs) 2025-12-04T09:28:37.2977916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.2978029Z method(*args, **kwargs) 2025-12-04T09:28:37.2978530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.2978809Z with policy(): 2025-12-04T09:28:37.2979323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.2979433Z raise RuntimeError(msg) 2025-12-04T09:28:37.2981072Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 518979584 and is now 615448576. 2025-12-04T09:28:37.2981081Z 2025-12-04T09:28:37.2981294Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.2982408Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2982421Z 2025-12-04T09:28:37.2982703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.2982907Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.2983097Z ======================= 1 failed, 10 deselected in 9.14s ======================= 2025-12-04T09:28:37.2983202Z Got exit code 1 2025-12-04T09:28:37.2983324Z Retrying single test... 2025-12-04T09:28:37.2984088Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bec03360e514672a.xml 2025-12-04T09:28:37.2984251Z ============================= test session starts ============================== 2025-12-04T09:28:37.2984597Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.2984702Z cachedir: .pytest_cache 2025-12-04T09:28:37.2985226Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.2985345Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.2985485Z configfile: pytest.ini 2025-12-04T09:28:37.2986070Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.2987333Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2987516Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.2988779Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.2988937Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.2989083Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.2990262Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.2990383Z Running 1 items in this shard 2025-12-04T09:28:37.2990390Z 2025-12-04T09:28:37.2991805Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:25:30.400000 44629 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44681 2025-12-04T09:28:37.2992251Z I1204 09:25:30.400000 44629 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44682 2025-12-04T09:28:37.2992705Z I1204 09:25:30.401000 44629 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44683 2025-12-04T09:28:37.2993140Z I1204 09:25:30.402000 44629 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44684 2025-12-04T09:28:37.2995545Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2995657Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.2998146Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.2998261Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3000564Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3000666Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3002970Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3003128Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3004822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3004946Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3006614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3006741Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3008394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3008627Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3010270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3010399Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3012674Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3012777Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3015269Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3015391Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3017864Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3018018Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3020384Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3020536Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3022236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3022399Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3024103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3024264Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3026057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3026212Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3027995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3028149Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3028940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3029050Z local_shape = tensor.shape 2025-12-04T09:28:37.3029828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3029935Z tensor.shape, 2025-12-04T09:28:37.3030757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3030862Z tensor.dtype, 2025-12-04T09:28:37.3031641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3031752Z local_shape = tensor.shape 2025-12-04T09:28:37.3032531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3032637Z local_shape = tensor.shape 2025-12-04T09:28:37.3033411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3033557Z local_shape = tensor.shape 2025-12-04T09:28:37.3034335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3034467Z tensor.shape, 2025-12-04T09:28:37.3035242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3035335Z tensor.shape, 2025-12-04T09:28:37.3036118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3036208Z tensor.dtype, 2025-12-04T09:28:37.3036987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3037083Z tensor.shape, 2025-12-04T09:28:37.3037862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3037958Z tensor.dtype, 2025-12-04T09:28:37.3038731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3038830Z tensor.dtype, 2025-12-04T09:28:37.3039346Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3039819Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3040738Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3041322Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3042356Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3042853Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3043889Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3044328Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3045386Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3045839Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3046736Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3047148Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3048047Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3048524Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3050464Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 628031488 and is now 726597632. 2025-12-04T09:28:37.3050791Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3051412Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3052897Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3053294Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3054137Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3054667Z E1204 09:25:37.158000 44681 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3055086Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3055587Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3056632Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3057113Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3058078Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3058445Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3059369Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3059842Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3060768Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3061234Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3062161Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3062584Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3063555Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3064041Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3066181Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 506396672 and is now 617545728. 2025-12-04T09:28:37.3066478Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3067047Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3068396Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3068705Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3069314Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3069772Z E1204 09:25:37.161000 44684 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.3070156Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3070650Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3071526Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3071947Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3072802Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3073127Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3073954Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3074368Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3075192Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3075603Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3076424Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3076835Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3077661Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3078104Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3080467Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.3080804Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3081447Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3082974Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3083315Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3083997Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3084517Z E1204 09:25:37.162000 44682 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3085040Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3085545Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3086521Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3087003Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3087973Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3088347Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3089274Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3089739Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3090723Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3091189Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3092249Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3092703Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3093840Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3094303Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3096337Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3096678Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3097313Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3098830Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3099173Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3099926Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3100444Z E1204 09:25:37.164000 44683 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3100558Z FAILED [8.9154s] [100%] 2025-12-04T09:28:37.3100565Z 2025-12-04T09:28:37.3100711Z =================================== FAILURES =================================== 2025-12-04T09:28:37.3101372Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.3101492Z Traceback (most recent call last): 2025-12-04T09:28:37.3102032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.3102153Z self._join_processes(fn) 2025-12-04T09:28:37.3102734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.3102892Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.3103494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.3103612Z raise RuntimeError(error) 2025-12-04T09:28:37.3103861Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.3103985Z Traceback (most recent call last): 2025-12-04T09:28:37.3104525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3104647Z getattr(self, test_name)() 2025-12-04T09:28:37.3105277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3105407Z fn() 2025-12-04T09:28:37.3106065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3106170Z method(*args, **kwargs) 2025-12-04T09:28:37.3106677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3106772Z method(*args, **kwargs) 2025-12-04T09:28:37.3107243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3107344Z with policy(): 2025-12-04T09:28:37.3107993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3108106Z raise RuntimeError(msg) 2025-12-04T09:28:37.3109608Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.3109618Z 2025-12-04T09:28:37.3109831Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3110892Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3110898Z 2025-12-04T09:28:37.3111152Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3111157Z 2025-12-04T09:28:37.3111327Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3111444Z Traceback (most recent call last): 2025-12-04T09:28:37.3111981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3112085Z getattr(self, test_name)() 2025-12-04T09:28:37.3112658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3112759Z fn() 2025-12-04T09:28:37.3113252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3113352Z method(*args, **kwargs) 2025-12-04T09:28:37.3113849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3113947Z method(*args, **kwargs) 2025-12-04T09:28:37.3114440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3114531Z with policy(): 2025-12-04T09:28:37.3115021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3115136Z raise RuntimeError(msg) 2025-12-04T09:28:37.3116623Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3116631Z 2025-12-04T09:28:37.3116848Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3117904Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3117909Z 2025-12-04T09:28:37.3118199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3118204Z 2025-12-04T09:28:37.3118208Z 2025-12-04T09:28:37.3118427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.3118788Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.3119705Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bec03360e514672a.xml - 2025-12-04T09:28:37.3119864Z =========================== short test summary info ============================ 2025-12-04T09:28:37.3121107Z FAILED [8.9154s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.3121215Z Traceback (most recent call last): 2025-12-04T09:28:37.3121702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3121811Z getattr(self, test_name)() 2025-12-04T09:28:37.3122285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3122372Z fn() 2025-12-04T09:28:37.3122820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3122913Z method(*args, **kwargs) 2025-12-04T09:28:37.3123370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3123461Z method(*args, **kwargs) 2025-12-04T09:28:37.3123905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3123999Z with policy(): 2025-12-04T09:28:37.3124450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3124552Z raise RuntimeError(msg) 2025-12-04T09:28:37.3125964Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.3125972Z 2025-12-04T09:28:37.3126163Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3127150Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3127157Z 2025-12-04T09:28:37.3127393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3127398Z 2025-12-04T09:28:37.3127551Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3127659Z Traceback (most recent call last): 2025-12-04T09:28:37.3128153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3128256Z getattr(self, test_name)() 2025-12-04T09:28:37.3128731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3128818Z fn() 2025-12-04T09:28:37.3129259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3129352Z method(*args, **kwargs) 2025-12-04T09:28:37.3129806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3129942Z method(*args, **kwargs) 2025-12-04T09:28:37.3130401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3130487Z with policy(): 2025-12-04T09:28:37.3130966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3131073Z raise RuntimeError(msg) 2025-12-04T09:28:37.3132423Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3132428Z 2025-12-04T09:28:37.3132630Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3133849Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3133855Z 2025-12-04T09:28:37.3134120Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3134309Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.3134480Z ======================= 1 failed, 14 deselected in 9.13s ======================= 2025-12-04T09:28:37.3134590Z Got exit code 1 2025-12-04T09:28:37.3134693Z Retrying single test... 2025-12-04T09:28:37.3135449Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7929a6c5753a5bf7.xml 2025-12-04T09:28:37.3135620Z ============================= test session starts ============================== 2025-12-04T09:28:37.3135972Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.3136086Z cachedir: .pytest_cache 2025-12-04T09:28:37.3136666Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.3136793Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.3136904Z configfile: pytest.ini 2025-12-04T09:28:37.3137438Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.3138692Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3138832Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.3140072Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3140239Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.3140388Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.3141571Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3141683Z Running 1 items in this shard 2025-12-04T09:28:37.3141688Z 2025-12-04T09:28:37.3143127Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda I1204 09:25:43.729000 44962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45014 2025-12-04T09:28:37.3143673Z I1204 09:25:43.730000 44962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45015 2025-12-04T09:28:37.3144167Z I1204 09:25:43.731000 44962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45016 2025-12-04T09:28:37.3144694Z I1204 09:25:43.732000 44962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45017 2025-12-04T09:28:37.3147057Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3147172Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3149283Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3149395Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3151500Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3151658Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3153769Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3153863Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3155402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3155520Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3157044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3157156Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3158680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3158823Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3160375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3160488Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3162608Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3162710Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3164817Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3164924Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3167079Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3167192Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3169294Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3169401Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3170914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3171068Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3172572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3172756Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3174596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3174801Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3176504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3176674Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3177495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3177609Z local_shape = tensor.shape 2025-12-04T09:28:37.3178431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3178527Z tensor.shape, 2025-12-04T09:28:37.3179509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3179617Z tensor.dtype, 2025-12-04T09:28:37.3180419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3180543Z local_shape = tensor.shape 2025-12-04T09:28:37.3181458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3181553Z tensor.shape, 2025-12-04T09:28:37.3182363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3182461Z tensor.dtype, 2025-12-04T09:28:37.3183270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3183377Z local_shape = tensor.shape 2025-12-04T09:28:37.3184181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3184301Z local_shape = tensor.shape 2025-12-04T09:28:37.3185110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3185216Z tensor.shape, 2025-12-04T09:28:37.3186011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3186107Z tensor.shape, 2025-12-04T09:28:37.3186918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3187014Z tensor.dtype, 2025-12-04T09:28:37.3187811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3187942Z tensor.dtype, 2025-12-04T09:28:37.3188374Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3188896Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3189907Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3190505Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3191520Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3191869Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3192756Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3193188Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3194069Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3194496Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3195372Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3195988Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3196895Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3197353Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3199255Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:28:37.3199592Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3200206Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3201681Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3202008Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3202668Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3203278Z E1204 09:25:50.461000 45016 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3203716Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3204207Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3205164Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3205635Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3206561Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3206919Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3207823Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3208265Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3209174Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3209615Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3210579Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3210985Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3211893Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3212347Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3214504Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 3. CUDA driver allocated memory was 506396672 and is now 617545728. 2025-12-04T09:28:37.3214856Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3215487Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3217012Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3217385Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3218078Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3218635Z E1204 09:25:50.465000 45017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.3219057Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3219572Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3220541Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3221027Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3221985Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3222358Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3223293Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3223746Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3224687Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3225195Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3226228Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3226733Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3227607Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3228044Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3229896Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 0. CUDA driver allocated memory was 628031488 and is now 724500480. 2025-12-04T09:28:37.3230218Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3230808Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3232349Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3232676Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3233315Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3233783Z E1204 09:25:50.491000 45014 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3234156Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3234611Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3235473Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3235907Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3236760Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3237084Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3237919Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3238325Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3239510Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3239942Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3240824Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3241214Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3242084Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3260107Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3262099Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.3262442Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3263064Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3264686Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3265070Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3265856Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3266468Z E1204 09:25:50.494000 45015 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3266561Z FAILED [8.5910s] [100%] 2025-12-04T09:28:37.3266568Z 2025-12-04T09:28:37.3266715Z =================================== FAILURES =================================== 2025-12-04T09:28:37.3267324Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.3267442Z Traceback (most recent call last): 2025-12-04T09:28:37.3267972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.3268080Z self._join_processes(fn) 2025-12-04T09:28:37.3268633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.3268765Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.3269331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.3269447Z raise RuntimeError(error) 2025-12-04T09:28:37.3269665Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3269787Z Traceback (most recent call last): 2025-12-04T09:28:37.3270352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3270454Z getattr(self, test_name)() 2025-12-04T09:28:37.3270964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3271047Z fn() 2025-12-04T09:28:37.3271607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3271708Z method(*args, **kwargs) 2025-12-04T09:28:37.3272153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3272252Z method(*args, **kwargs) 2025-12-04T09:28:37.3272696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3272784Z with policy(): 2025-12-04T09:28:37.3273246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3273345Z raise RuntimeError(msg) 2025-12-04T09:28:37.3274702Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:28:37.3274716Z 2025-12-04T09:28:37.3274905Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3275879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3275913Z 2025-12-04T09:28:37.3276155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3276161Z 2025-12-04T09:28:37.3276169Z 2025-12-04T09:28:37.3276361Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.3276626Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.3277448Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7929a6c5753a5bf7.xml - 2025-12-04T09:28:37.3277602Z =========================== short test summary info ============================ 2025-12-04T09:28:37.3278869Z FAILED [8.5910s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3278989Z Traceback (most recent call last): 2025-12-04T09:28:37.3279715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3279902Z getattr(self, test_name)() 2025-12-04T09:28:37.3280440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3280525Z fn() 2025-12-04T09:28:37.3281032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3281143Z method(*args, **kwargs) 2025-12-04T09:28:37.3281640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3281742Z method(*args, **kwargs) 2025-12-04T09:28:37.3282251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3282343Z with policy(): 2025-12-04T09:28:37.3282962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3283073Z raise RuntimeError(msg) 2025-12-04T09:28:37.3284605Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 4608 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:28:37.3284611Z 2025-12-04T09:28:37.3284831Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3285933Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3285942Z 2025-12-04T09:28:37.3286215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3286389Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.3286561Z ======================= 1 failed, 14 deselected in 8.80s ======================= 2025-12-04T09:28:37.3286667Z Got exit code 1 2025-12-04T09:28:37.3287684Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3288096Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.3288847Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b0deb68b75574955.xml 2025-12-04T09:28:37.3289043Z ============================= test session starts ============================== 2025-12-04T09:28:37.3289404Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.3289552Z cachedir: .pytest_cache 2025-12-04T09:28:37.3290073Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.3290194Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.3290298Z configfile: pytest.ini 2025-12-04T09:28:37.3290843Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.3292328Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3292468Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.3293896Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3294054Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.3294213Z collected 15 items / 11 deselected / 4 selected 2025-12-04T09:28:37.3294350Z stepcurrent: skipping 11 already run items. 2025-12-04T09:28:37.3294468Z Running 4 items in this shard 2025-12-04T09:28:37.3294474Z 2025-12-04T09:28:37.3295912Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:25:57.030000 45295 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45347 2025-12-04T09:28:37.3296413Z I1204 09:25:57.031000 45295 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45348 2025-12-04T09:28:37.3296978Z I1204 09:25:57.031000 45295 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45349 2025-12-04T09:28:37.3297469Z I1204 09:25:57.032000 45295 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45350 2025-12-04T09:28:37.3299879Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3299990Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3302379Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3302492Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3304858Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3305001Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3307411Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3307534Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3309071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3309190Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3310888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3311014Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3312623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3312872Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3314670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3314800Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3317114Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3317236Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3319536Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3319656Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3321955Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3322131Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3324482Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3324582Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3326246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3326392Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3327902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3328042Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3329601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3329747Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3331250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3331393Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3332106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3332216Z local_shape = tensor.shape 2025-12-04T09:28:37.3332931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3333025Z tensor.shape, 2025-12-04T09:28:37.3333990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3334085Z tensor.dtype, 2025-12-04T09:28:37.3334895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3335039Z local_shape = tensor.shape 2025-12-04T09:28:37.3335855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3335949Z tensor.shape, 2025-12-04T09:28:37.3336780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3336899Z local_shape = tensor.shape 2025-12-04T09:28:37.3337697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3337796Z tensor.dtype, 2025-12-04T09:28:37.3338594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3338687Z tensor.shape, 2025-12-04T09:28:37.3339502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3339599Z tensor.dtype, 2025-12-04T09:28:37.3340391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3340511Z local_shape = tensor.shape 2025-12-04T09:28:37.3341303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3341405Z tensor.shape, 2025-12-04T09:28:37.3342200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3342292Z tensor.dtype, 2025-12-04T09:28:37.3342731Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3343289Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3344274Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3344752Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3345823Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3346270Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3347096Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3347512Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3348336Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3348743Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3349564Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3349966Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3350824Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3351230Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3352977Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:28:37.3353279Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3353844Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3355188Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3355492Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3356097Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3356604Z E1204 09:26:03.787000 45349 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3356985Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3357427Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3358289Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3358712Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3359565Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3359902Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3360721Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3361130Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3361945Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3362395Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3363217Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3363609Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3364438Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3364843Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3366584Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.3366885Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3367450Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3368787Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3369094Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3369767Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3370227Z E1204 09:26:03.788000 45347 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3370609Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3371049Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3371906Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3372326Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3373233Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3373743Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3374668Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3375135Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3376057Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3376560Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3377511Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3377924Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3379194Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3379659Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3381627Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3381962Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3382597Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3384112Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3384551Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3385237Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3385752Z E1204 09:26:03.789000 45348 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3386177Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3386679Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3387655Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3388136Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3389094Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3389467Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3390397Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3390956Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3391875Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3392343Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3393212Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3393598Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3394483Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3394917Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3396750Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 3. CUDA driver allocated memory was 395247616 and is now 613351424. 2025-12-04T09:28:37.3397061Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3397662Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3399223Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3399529Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3400136Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3400588Z E1204 09:26:03.789000 45350 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.3400686Z FAILED [8.6000s] [ 25%] 2025-12-04T09:28:37.3400691Z 2025-12-04T09:28:37.3400818Z =================================== FAILURES =================================== 2025-12-04T09:28:37.3401397Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.3401501Z Traceback (most recent call last): 2025-12-04T09:28:37.3401989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.3402098Z self._join_processes(fn) 2025-12-04T09:28:37.3402610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.3402737Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.3403276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.3403375Z raise RuntimeError(error) 2025-12-04T09:28:37.3403587Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3403718Z Traceback (most recent call last): 2025-12-04T09:28:37.3404193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3404299Z getattr(self, test_name)() 2025-12-04T09:28:37.3404795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3404874Z fn() 2025-12-04T09:28:37.3405322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3405414Z method(*args, **kwargs) 2025-12-04T09:28:37.3405863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3405953Z method(*args, **kwargs) 2025-12-04T09:28:37.3406394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3406487Z with policy(): 2025-12-04T09:28:37.3406936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3407040Z raise RuntimeError(msg) 2025-12-04T09:28:37.3408400Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:28:37.3408406Z 2025-12-04T09:28:37.3408596Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3409569Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3409576Z 2025-12-04T09:28:37.3409808Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3409813Z 2025-12-04T09:28:37.3409817Z 2025-12-04T09:28:37.3410072Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.3410306Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.3411136Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b0deb68b75574955.xml - 2025-12-04T09:28:37.3411285Z =========================== short test summary info ============================ 2025-12-04T09:28:37.3412380Z FAILED [8.6000s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3412495Z Traceback (most recent call last): 2025-12-04T09:28:37.3412981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3413090Z getattr(self, test_name)() 2025-12-04T09:28:37.3413793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3413883Z fn() 2025-12-04T09:28:37.3414402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3414507Z method(*args, **kwargs) 2025-12-04T09:28:37.3415009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3415119Z method(*args, **kwargs) 2025-12-04T09:28:37.3416100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3416200Z with policy(): 2025-12-04T09:28:37.3416708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3416860Z raise RuntimeError(msg) 2025-12-04T09:28:37.3418397Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:28:37.3418403Z 2025-12-04T09:28:37.3418616Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3419707Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3419715Z 2025-12-04T09:28:37.3419982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3420159Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.3420340Z ======================= 1 failed, 11 deselected in 8.81s ======================= 2025-12-04T09:28:37.3420430Z Got exit code 1 2025-12-04T09:28:37.3420542Z Retrying single test... 2025-12-04T09:28:37.3421300Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-eda6c23a06d3c574.xml 2025-12-04T09:28:37.3421456Z ============================= test session starts ============================== 2025-12-04T09:28:37.3421807Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.3421915Z cachedir: .pytest_cache 2025-12-04T09:28:37.3422433Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.3422551Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.3422724Z configfile: pytest.ini 2025-12-04T09:28:37.3423267Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.3424519Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3424650Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.3425980Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3426116Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.3426258Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.3427289Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3427397Z Running 1 items in this shard 2025-12-04T09:28:37.3427402Z 2025-12-04T09:28:37.3428666Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:26:10.370000 45628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45680 2025-12-04T09:28:37.3429109Z I1204 09:26:10.371000 45628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45681 2025-12-04T09:28:37.3429580Z I1204 09:26:10.372000 45628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45682 2025-12-04T09:28:37.3430011Z I1204 09:26:10.372000 45628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45683 2025-12-04T09:28:37.3432173Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3432273Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3434399Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3434498Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3436604Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3436701Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3438862Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3438963Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3440480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3440599Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3442118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3442241Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3443754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3443896Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3445403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3445548Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3447658Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3447764Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3450036Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3450139Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3452432Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3452541Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3455093Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3455203Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3456924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3457090Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3458800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3458959Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3460665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3460881Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3462571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3462726Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3463532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3463654Z local_shape = tensor.shape 2025-12-04T09:28:37.3464457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3464579Z local_shape = tensor.shape 2025-12-04T09:28:37.3465379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3465602Z local_shape = tensor.shape 2025-12-04T09:28:37.3466384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3466475Z tensor.shape, 2025-12-04T09:28:37.3467252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3467347Z tensor.shape, 2025-12-04T09:28:37.3468173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3468275Z tensor.shape, 2025-12-04T09:28:37.3469047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3469139Z tensor.dtype, 2025-12-04T09:28:37.3469915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3470003Z tensor.dtype, 2025-12-04T09:28:37.3470785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3470890Z local_shape = tensor.shape 2025-12-04T09:28:37.3471777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3471869Z tensor.dtype, 2025-12-04T09:28:37.3472615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3472711Z tensor.shape, 2025-12-04T09:28:37.3473461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3473546Z tensor.dtype, 2025-12-04T09:28:37.3473953Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3474455Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3475560Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3476062Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3476997Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3477353Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3478250Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3478852Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3479942Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3480403Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3481332Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3481751Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3482778Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3483237Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3485193Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 630128640 and is now 722403328. 2025-12-04T09:28:37.3485525Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3486159Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3487668Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3488006Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3488690Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3489239Z E1204 09:26:17.056000 45680 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3489665Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3490161Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3491371Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3491815Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3492727Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3493072Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3494218Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3494685Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3495614Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3496072Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3496998Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3497472Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3498405Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3498861Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3500817Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.3501156Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3501785Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3503300Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3503633Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3504347Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3504867Z E1204 09:26:17.057000 45681 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3505322Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3505922Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3506782Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3507199Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3508047Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3508375Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3509196Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3509605Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3510421Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3510826Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3511692Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3512060Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3512886Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3513288Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3515043Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3515340Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3515897Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3517244Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3517566Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3518169Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3518652Z E1204 09:26:17.058000 45682 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3519029Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3519467Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3520328Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3520750Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3521596Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3521928Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3522940Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3523373Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3524418Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3524923Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3525832Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3526282Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3527217Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3527663Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3529566Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 3. CUDA driver allocated memory was 309264384 and is now 613351424. 2025-12-04T09:28:37.3529891Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3530503Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3531983Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3532374Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3533060Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3533786Z E1204 09:26:17.059000 45683 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.3533898Z FAILED [8.4968s] [100%] 2025-12-04T09:28:37.3533904Z 2025-12-04T09:28:37.3534049Z =================================== FAILURES =================================== 2025-12-04T09:28:37.3534700Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.3534844Z Traceback (most recent call last): 2025-12-04T09:28:37.3535405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.3535524Z self._join_processes(fn) 2025-12-04T09:28:37.3536104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.3536252Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.3536847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.3536960Z raise RuntimeError(error) 2025-12-04T09:28:37.3537197Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3537312Z Traceback (most recent call last): 2025-12-04T09:28:37.3537867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3537983Z getattr(self, test_name)() 2025-12-04T09:28:37.3538666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3538761Z fn() 2025-12-04T09:28:37.3539260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3539359Z method(*args, **kwargs) 2025-12-04T09:28:37.3539865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3539967Z method(*args, **kwargs) 2025-12-04T09:28:37.3540459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3540560Z with policy(): 2025-12-04T09:28:37.3541088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3541200Z raise RuntimeError(msg) 2025-12-04T09:28:37.3542721Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3542730Z 2025-12-04T09:28:37.3542950Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3544033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3544039Z 2025-12-04T09:28:37.3544298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3544331Z 2025-12-04T09:28:37.3544346Z 2025-12-04T09:28:37.3544562Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.3544830Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.3546040Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-eda6c23a06d3c574.xml - 2025-12-04T09:28:37.3546187Z =========================== short test summary info ============================ 2025-12-04T09:28:37.3547300Z FAILED [8.4968s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.3547416Z Traceback (most recent call last): 2025-12-04T09:28:37.3547897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3548003Z getattr(self, test_name)() 2025-12-04T09:28:37.3548478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3548558Z fn() 2025-12-04T09:28:37.3549011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3549098Z method(*args, **kwargs) 2025-12-04T09:28:37.3549548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3549636Z method(*args, **kwargs) 2025-12-04T09:28:37.3550076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3550335Z with policy(): 2025-12-04T09:28:37.3550810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3550906Z raise RuntimeError(msg) 2025-12-04T09:28:37.3552421Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3552429Z 2025-12-04T09:28:37.3552632Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3553656Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3553664Z 2025-12-04T09:28:37.3553908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3554079Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.3554239Z ======================= 1 failed, 14 deselected in 8.71s ======================= 2025-12-04T09:28:37.3554332Z Got exit code 1 2025-12-04T09:28:37.3554436Z Retrying single test... 2025-12-04T09:28:37.3555148Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2e17c39fb483ae46.xml 2025-12-04T09:28:37.3555302Z ============================= test session starts ============================== 2025-12-04T09:28:37.3555625Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.3555724Z cachedir: .pytest_cache 2025-12-04T09:28:37.3556211Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.3556351Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.3556445Z configfile: pytest.ini 2025-12-04T09:28:37.3556950Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.3558155Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3558289Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.3559437Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3559581Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.3559728Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.3560836Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3560953Z Running 1 items in this shard 2025-12-04T09:28:37.3560958Z 2025-12-04T09:28:37.3562408Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda I1204 09:26:23.559000 45961 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46013 2025-12-04T09:28:37.3562857Z I1204 09:26:23.560000 45961 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46014 2025-12-04T09:28:37.3563296Z I1204 09:26:23.561000 45961 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46015 2025-12-04T09:28:37.3563773Z I1204 09:26:23.562000 45961 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46016 2025-12-04T09:28:37.3566081Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3566186Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3568476Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3568583Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3570814Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3570915Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3573245Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:113: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3573384Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3575269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3575397Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3577147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3577270Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3579175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3579310Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3581119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3581252Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3583637Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3583752Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3586126Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3586239Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3588611Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3588764Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3591185Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:124: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3591340Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3592847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3592995Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3594503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3594643Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3596142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3596336Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3597836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3597973Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:28:37.3598696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3598797Z local_shape = tensor.shape 2025-12-04T09:28:37.3599506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3599602Z tensor.shape, 2025-12-04T09:28:37.3600307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3600407Z local_shape = tensor.shape 2025-12-04T09:28:37.3601143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3601225Z tensor.dtype, 2025-12-04T09:28:37.3601933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3602042Z tensor.shape, 2025-12-04T09:28:37.3602915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3603024Z local_shape = tensor.shape 2025-12-04T09:28:37.3603800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3603901Z local_shape = tensor.shape 2025-12-04T09:28:37.3604645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3604735Z tensor.dtype, 2025-12-04T09:28:37.3605490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3605581Z tensor.shape, 2025-12-04T09:28:37.3606340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3606426Z tensor.shape, 2025-12-04T09:28:37.3607359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3607460Z tensor.dtype, 2025-12-04T09:28:37.3608237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:28:37.3608331Z tensor.dtype, 2025-12-04T09:28:37.3608746Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3609234Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3610243Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3610713Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3611664Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3612016Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3612916Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3613448Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3614531Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3614993Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3615921Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3616344Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3617315Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3617776Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3619774Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 728694784. 2025-12-04T09:28:37.3620109Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3620746Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3622266Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3622610Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3623292Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3623803Z E1204 09:26:30.325000 46013 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3624231Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3624781Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3625862Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3626444Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3627348Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3627691Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3628560Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3628991Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3629864Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3630302Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3631173Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3631596Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3632468Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3632923Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3634772Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.3635089Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3635856Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3637327Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3637654Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3638316Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3638815Z E1204 09:26:30.328000 46014 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3639293Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3639780Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3640720Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3641272Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3642175Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3642517Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3643388Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3643822Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3644689Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3645124Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3646022Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3646410Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3647426Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3647829Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3650043Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 3. CUDA driver allocated memory was 422510592 and is now 615448576. 2025-12-04T09:28:37.3650367Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3650965Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3652594Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3652919Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3653822Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3654407Z E1204 09:26:30.331000 46016 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.3654837Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3655334Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3656302Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3656777Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3657743Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3658117Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3659038Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3659497Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3660418Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3660905Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3661830Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3662270Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3663211Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3663670Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3665739Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:28:37.3666208Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3666772Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3668108Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3668412Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3669392Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3669886Z E1204 09:26:30.338000 46015 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3669983Z FAILED [8.5884s] [100%] 2025-12-04T09:28:37.3669989Z 2025-12-04T09:28:37.3670123Z =================================== FAILURES =================================== 2025-12-04T09:28:37.3670895Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.3671008Z Traceback (most recent call last): 2025-12-04T09:28:37.3671537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.3671651Z self._join_processes(fn) 2025-12-04T09:28:37.3672213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.3672347Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.3672934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.3673040Z raise RuntimeError(error) 2025-12-04T09:28:37.3673269Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.3673379Z Traceback (most recent call last): 2025-12-04T09:28:37.3673898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3674042Z getattr(self, test_name)() 2025-12-04T09:28:37.3674555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3674646Z fn() 2025-12-04T09:28:37.3675135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3675271Z method(*args, **kwargs) 2025-12-04T09:28:37.3675770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3675868Z method(*args, **kwargs) 2025-12-04T09:28:37.3676351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3676453Z with policy(): 2025-12-04T09:28:37.3676948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3677068Z raise RuntimeError(msg) 2025-12-04T09:28:37.3678558Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 728694784. 2025-12-04T09:28:37.3678566Z 2025-12-04T09:28:37.3678919Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3680182Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3680188Z 2025-12-04T09:28:37.3680453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3680459Z 2025-12-04T09:28:37.3680463Z 2025-12-04T09:28:37.3680693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.3680953Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.3681989Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2e17c39fb483ae46.xml - 2025-12-04T09:28:37.3682161Z =========================== short test summary info ============================ 2025-12-04T09:28:37.3683405Z FAILED [8.5884s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.3683538Z Traceback (most recent call last): 2025-12-04T09:28:37.3684084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3684206Z getattr(self, test_name)() 2025-12-04T09:28:37.3684739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3684831Z fn() 2025-12-04T09:28:37.3685341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3685449Z method(*args, **kwargs) 2025-12-04T09:28:37.3685948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3686060Z method(*args, **kwargs) 2025-12-04T09:28:37.3686559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3686662Z with policy(): 2025-12-04T09:28:37.3687165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3687310Z raise RuntimeError(msg) 2025-12-04T09:28:37.3688851Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 14848 on device 0. CUDA driver allocated memory was 628031488 and is now 728694784. 2025-12-04T09:28:37.3688915Z 2025-12-04T09:28:37.3689128Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3690239Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3690245Z 2025-12-04T09:28:37.3690507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3690809Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.3690976Z ======================= 1 failed, 14 deselected in 8.80s ======================= 2025-12-04T09:28:37.3691070Z Got exit code 1 2025-12-04T09:28:37.3692130Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.3692492Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.3693225Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9ad7d4c20da7406b.xml 2025-12-04T09:28:37.3693379Z ============================= test session starts ============================== 2025-12-04T09:28:37.3693857Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.3693979Z cachedir: .pytest_cache 2025-12-04T09:28:37.3694495Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.3694683Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.3694804Z configfile: pytest.ini 2025-12-04T09:28:37.3695340Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.3696606Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3696736Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.3697957Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3698123Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.3698274Z collected 15 items / 12 deselected / 3 selected 2025-12-04T09:28:37.3698425Z stepcurrent: skipping 12 already run items. 2025-12-04T09:28:37.3698538Z Running 3 items in this shard 2025-12-04T09:28:37.3698544Z 2025-12-04T09:28:37.3699811Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda I1204 09:26:36.879000 46294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46346 2025-12-04T09:28:37.3700319Z I1204 09:26:36.880000 46294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46347 2025-12-04T09:28:37.3700807Z I1204 09:26:36.881000 46294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46348 2025-12-04T09:28:37.3701329Z I1204 09:26:36.882000 46294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46349 2025-12-04T09:28:37.3703716Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3703870Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3706403Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3706517Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3708603Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3708709Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3711022Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3711128Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3713184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3713310Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3714980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3715101Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3716755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3716876Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3718533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3718714Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3719130Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3719628Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3720568Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3721048Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3721979Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3722345Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3723251Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3723693Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3724607Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3725110Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3726018Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3726418Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3727318Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3727773Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3729502Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 632225792 and is now 720306176. 2025-12-04T09:28:37.3729832Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3730443Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3731761Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3732117Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3732804Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3733410Z E1204 09:26:43.642000 46346 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3733993Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3734510Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3735481Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3735967Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3736926Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3737296Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3738232Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3738692Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3739737Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3740273Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3741213Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3741629Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3742640Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3743118Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3744914Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.3745260Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3745990Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3747292Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3747670Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3748361Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3748863Z E1204 09:26:43.644000 46347 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3749270Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3749757Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3750695Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3751279Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3752261Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3752587Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3753418Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3753823Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3754709Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3755117Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3755949Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3756314Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3757139Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3757558Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3759128Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.3759435Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3759991Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3761222Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3761620Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3762227Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3762695Z E1204 09:26:43.645000 46349 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.3763065Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3763522Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3764385Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3764804Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3765660Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3765985Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3766813Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3767221Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3768102Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3768508Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3769324Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3769701Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3770526Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3770943Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3772517Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:28:37.3772820Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3773441Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3774978Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3775339Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3776021Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3776544Z E1204 09:26:43.649000 46348 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3776641Z FAILED [8.9099s] [ 33%] 2025-12-04T09:28:37.3776648Z 2025-12-04T09:28:37.3776807Z =================================== FAILURES =================================== 2025-12-04T09:28:37.3777271Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.3777397Z Traceback (most recent call last): 2025-12-04T09:28:37.3777944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.3778053Z self._join_processes(fn) 2025-12-04T09:28:37.3778803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.3778962Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.3779563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.3779682Z raise RuntimeError(error) 2025-12-04T09:28:37.3779913Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.3780034Z Traceback (most recent call last): 2025-12-04T09:28:37.3781141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3781261Z getattr(self, test_name)() 2025-12-04T09:28:37.3781805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3781897Z fn() 2025-12-04T09:28:37.3782400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3782511Z method(*args, **kwargs) 2025-12-04T09:28:37.3783013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3783115Z method(*args, **kwargs) 2025-12-04T09:28:37.3783628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3783723Z with policy(): 2025-12-04T09:28:37.3784236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3784342Z raise RuntimeError(msg) 2025-12-04T09:28:37.3785703Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.3785711Z 2025-12-04T09:28:37.3785939Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3786850Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3786896Z 2025-12-04T09:28:37.3787171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3787177Z 2025-12-04T09:28:37.3787338Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.3787461Z Traceback (most recent call last): 2025-12-04T09:28:37.3788054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3788163Z getattr(self, test_name)() 2025-12-04T09:28:37.3788711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3788800Z fn() 2025-12-04T09:28:37.3789297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3789410Z method(*args, **kwargs) 2025-12-04T09:28:37.3789910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3790015Z method(*args, **kwargs) 2025-12-04T09:28:37.3790527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3790627Z with policy(): 2025-12-04T09:28:37.3791195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3791287Z raise RuntimeError(msg) 2025-12-04T09:28:37.3792479Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.3792493Z 2025-12-04T09:28:37.3792684Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3793495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3793500Z 2025-12-04T09:28:37.3793811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3793818Z 2025-12-04T09:28:37.3793822Z 2025-12-04T09:28:37.3794015Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.3794259Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.3795083Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9ad7d4c20da7406b.xml - 2025-12-04T09:28:37.3795232Z =========================== short test summary info ============================ 2025-12-04T09:28:37.3796183Z FAILED [8.9099s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.3796295Z Traceback (most recent call last): 2025-12-04T09:28:37.3796782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3796880Z getattr(self, test_name)() 2025-12-04T09:28:37.3797351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3797444Z fn() 2025-12-04T09:28:37.3797886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3797978Z method(*args, **kwargs) 2025-12-04T09:28:37.3798431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3798550Z method(*args, **kwargs) 2025-12-04T09:28:37.3799184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3799277Z with policy(): 2025-12-04T09:28:37.3799752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3799896Z raise RuntimeError(msg) 2025-12-04T09:28:37.3801167Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.3801173Z 2025-12-04T09:28:37.3801384Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3802242Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3802250Z 2025-12-04T09:28:37.3802510Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3802518Z 2025-12-04T09:28:37.3802669Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.3802779Z Traceback (most recent call last): 2025-12-04T09:28:37.3803294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3803398Z getattr(self, test_name)() 2025-12-04T09:28:37.3803894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3803986Z fn() 2025-12-04T09:28:37.3804459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3804562Z method(*args, **kwargs) 2025-12-04T09:28:37.3805031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3805180Z method(*args, **kwargs) 2025-12-04T09:28:37.3805658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3805747Z with policy(): 2025-12-04T09:28:37.3806220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3806333Z raise RuntimeError(msg) 2025-12-04T09:28:37.3807599Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.3807607Z 2025-12-04T09:28:37.3807815Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3808674Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3808681Z 2025-12-04T09:28:37.3808936Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3809099Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.3809258Z ======================= 1 failed, 12 deselected in 9.12s ======================= 2025-12-04T09:28:37.3809356Z Got exit code 1 2025-12-04T09:28:37.3809451Z Retrying single test... 2025-12-04T09:28:37.3810252Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-5b391720a035fce0.xml 2025-12-04T09:28:37.3810430Z ============================= test session starts ============================== 2025-12-04T09:28:37.3810737Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.3810847Z cachedir: .pytest_cache 2025-12-04T09:28:37.3811328Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.3811438Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.3811540Z configfile: pytest.ini 2025-12-04T09:28:37.3812013Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.3813138Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3813334Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.3814693Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3814856Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.3815003Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.3816001Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3816113Z Running 1 items in this shard 2025-12-04T09:28:37.3816119Z 2025-12-04T09:28:37.3817384Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda I1204 09:26:50.199000 46627 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46679 2025-12-04T09:28:37.3817949Z I1204 09:26:50.200000 46627 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46680 2025-12-04T09:28:37.3818441Z I1204 09:26:50.201000 46627 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46681 2025-12-04T09:28:37.3818936Z I1204 09:26:50.202000 46627 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46682 2025-12-04T09:28:37.3821325Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3821448Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3823821Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3823946Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3826453Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3826589Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3828703Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3828799Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3830342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3830461Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3831994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3832104Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3833672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3833785Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3835300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3835409Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3835788Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3836247Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3837112Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3837546Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3838391Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3838726Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3839546Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3839984Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3840814Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3841243Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3842077Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3842444Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3843284Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3843690Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3845271Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 724500480. 2025-12-04T09:28:37.3845578Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3846139Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3847394Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3847695Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3848314Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3848774Z E1204 09:26:56.904000 46679 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3849148Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3849599Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3850602Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3851035Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3852072Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3852416Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3853594Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3854212Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3855187Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3855645Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3856578Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3857015Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3857944Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3858418Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3860191Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.3860538Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3861221Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3862569Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3862906Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3863593Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3864106Z E1204 09:26:56.905000 46682 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.3864530Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3865044Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3866111Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3866678Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3867524Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3867877Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3868704Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3869136Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3869967Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3870371Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3871201Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3871571Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3872393Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3872808Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3874378Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.3874685Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3875289Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3876492Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3876787Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3877393Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3877863Z E1204 09:26:56.908000 46680 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3878240Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3878835Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3880120Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3880608Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3881568Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3882004Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3882948Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3883441Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3884404Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3884861Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3885792Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3886224Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3887151Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3887621Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3889394Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.3889815Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3890447Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3891987Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3892302Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3892949Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3893511Z E1204 09:26:56.909000 46681 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3893773Z FAILED [8.5885s] [100%] 2025-12-04T09:28:37.3893782Z 2025-12-04T09:28:37.3893936Z =================================== FAILURES =================================== 2025-12-04T09:28:37.3894403Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.3894525Z Traceback (most recent call last): 2025-12-04T09:28:37.3895079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.3895195Z self._join_processes(fn) 2025-12-04T09:28:37.3895781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.3895969Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.3896578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.3896753Z raise RuntimeError(error) 2025-12-04T09:28:37.3896985Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.3897105Z Traceback (most recent call last): 2025-12-04T09:28:37.3897656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3897768Z getattr(self, test_name)() 2025-12-04T09:28:37.3898304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3898395Z fn() 2025-12-04T09:28:37.3898897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3899015Z method(*args, **kwargs) 2025-12-04T09:28:37.3899514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3899619Z method(*args, **kwargs) 2025-12-04T09:28:37.3900132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3900231Z with policy(): 2025-12-04T09:28:37.3900744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3900853Z raise RuntimeError(msg) 2025-12-04T09:28:37.3902209Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.3902218Z 2025-12-04T09:28:37.3902442Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3903409Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3903417Z 2025-12-04T09:28:37.3903696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3903702Z 2025-12-04T09:28:37.3903706Z 2025-12-04T09:28:37.3903922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.3904190Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.3905118Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-5b391720a035fce0.xml - 2025-12-04T09:28:37.3905285Z =========================== short test summary info ============================ 2025-12-04T09:28:37.3906516Z FAILED [8.5885s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:37.3906622Z Traceback (most recent call last): 2025-12-04T09:28:37.3907118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3907219Z getattr(self, test_name)() 2025-12-04T09:28:37.3907873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3907968Z fn() 2025-12-04T09:28:37.3908441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3908572Z method(*args, **kwargs) 2025-12-04T09:28:37.3909054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3909321Z method(*args, **kwargs) 2025-12-04T09:28:37.3909822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3909948Z with policy(): 2025-12-04T09:28:37.3910432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3910543Z raise RuntimeError(msg) 2025-12-04T09:28:37.3911856Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.3911865Z 2025-12-04T09:28:37.3912082Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3912965Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3912972Z 2025-12-04T09:28:37.3913226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3913410Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.3913577Z ======================= 1 failed, 14 deselected in 8.80s ======================= 2025-12-04T09:28:37.3913678Z Got exit code 1 2025-12-04T09:28:37.3913780Z Retrying single test... 2025-12-04T09:28:37.3914516Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7e0ad2dc0411fa40.xml 2025-12-04T09:28:37.3914686Z ============================= test session starts ============================== 2025-12-04T09:28:37.3915020Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.3915189Z cachedir: .pytest_cache 2025-12-04T09:28:37.3915691Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.3915808Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.3915922Z configfile: pytest.ini 2025-12-04T09:28:37.3916437Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.3917646Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3917790Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.3918979Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.3919142Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.3919285Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.3920260Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3920380Z Running 1 items in this shard 2025-12-04T09:28:37.3920386Z 2025-12-04T09:28:37.3921940Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda I1204 09:27:03.450000 46960 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 47012 2025-12-04T09:28:37.3922456Z I1204 09:27:03.451000 46960 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 47013 2025-12-04T09:28:37.3922921Z I1204 09:27:03.451000 46960 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 47014 2025-12-04T09:28:37.3923418Z I1204 09:27:03.452000 46960 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 47015 2025-12-04T09:28:37.3925887Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3926014Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3928329Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3928440Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3930726Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3930889Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3933249Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.3933360Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.3935240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3935375Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3937103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3937228Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3939192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3939368Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3941366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.3941492Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.3941927Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3942454Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3943435Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3943930Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3944886Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3945256Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3946313Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3946758Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3947835Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3948270Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3949155Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3949548Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3950421Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3951174Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3952903Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.3953324Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3953932Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3955307Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3955698Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3956621Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3957140Z E1204 09:27:10.159000 47014 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.3957548Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3958055Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3958998Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3959475Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3960404Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3960759Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3961662Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3962169Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3963085Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3963601Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3964505Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3964921Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3965824Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3966444Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3968246Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.3968710Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3969342Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3970787Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3971658Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3972332Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3972844Z E1204 09:27:10.159000 47013 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.3973324Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3974005Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3974981Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3975459Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3976432Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3976797Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3977812Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3978274Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3979596Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3980059Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3980989Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3981420Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3982695Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3983174Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.3985170Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.3985664Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3986297Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.3988356Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.3988766Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.3989455Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.3989985Z E1204 09:27:10.159000 47012 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.3990525Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.3991026Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.3991970Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.3992432Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.3993370Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.3993726Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.3994726Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3995319Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3996233Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.3997050Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.3998015Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.3998439Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.3999348Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.3999803Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4001637Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.4002768Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4003783Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4005680Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.4006014Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4006683Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4007202Z E1204 09:27:10.166000 47015 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.4007300Z FAILED [8.4837s] [100%] 2025-12-04T09:28:37.4007311Z 2025-12-04T09:28:37.4007454Z =================================== FAILURES =================================== 2025-12-04T09:28:37.4007972Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda _ 2025-12-04T09:28:37.4008090Z Traceback (most recent call last): 2025-12-04T09:28:37.4008627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.4008733Z self._join_processes(fn) 2025-12-04T09:28:37.4009364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.4009518Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.4010105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.4010431Z raise RuntimeError(error) 2025-12-04T09:28:37.4010667Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.4010783Z Traceback (most recent call last): 2025-12-04T09:28:37.4011309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4011418Z getattr(self, test_name)() 2025-12-04T09:28:37.4012276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4012378Z fn() 2025-12-04T09:28:37.4012869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4012981Z method(*args, **kwargs) 2025-12-04T09:28:37.4013544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4013820Z method(*args, **kwargs) 2025-12-04T09:28:37.4014340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4014440Z with policy(): 2025-12-04T09:28:37.4015025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4015153Z raise RuntimeError(msg) 2025-12-04T09:28:37.4016620Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.4016671Z 2025-12-04T09:28:37.4016902Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4018129Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.4018175Z 2025-12-04T09:28:37.4018455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4018461Z 2025-12-04T09:28:37.4018466Z 2025-12-04T09:28:37.4019059Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.4019322Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.4020891Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7e0ad2dc0411fa40.xml - 2025-12-04T09:28:37.4021066Z =========================== short test summary info ============================ 2025-12-04T09:28:37.4022622Z FAILED [8.4837s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.4022749Z Traceback (most recent call last): 2025-12-04T09:28:37.4023299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4023419Z getattr(self, test_name)() 2025-12-04T09:28:37.4023953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4024051Z fn() 2025-12-04T09:28:37.4024620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4024734Z method(*args, **kwargs) 2025-12-04T09:28:37.4025404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4025505Z method(*args, **kwargs) 2025-12-04T09:28:37.4026678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4026796Z with policy(): 2025-12-04T09:28:37.4027384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4027503Z raise RuntimeError(msg) 2025-12-04T09:28:37.4028816Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda! Caching allocator allocated memory was 0 and is now reported as 2560 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.4028824Z 2025-12-04T09:28:37.4029036Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4029936Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.4029944Z 2025-12-04T09:28:37.4030198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4030381Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.4030599Z ======================= 1 failed, 14 deselected in 8.70s ======================= 2025-12-04T09:28:37.4030692Z Got exit code 1 2025-12-04T09:28:37.4031513Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda 2025-12-04T09:28:37.4032284Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.4033073Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ab2eda46e6c1c6d0.xml 2025-12-04T09:28:37.4033604Z ============================= test session starts ============================== 2025-12-04T09:28:37.4033980Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.4034097Z cachedir: .pytest_cache 2025-12-04T09:28:37.4034986Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.4035121Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.4035226Z configfile: pytest.ini 2025-12-04T09:28:37.4036070Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.4037670Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4037809Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.4039088Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4039241Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.4039474Z collected 15 items / 13 deselected / 2 selected 2025-12-04T09:28:37.4039624Z stepcurrent: skipping 13 already run items. 2025-12-04T09:28:37.4039735Z Running 2 items in this shard 2025-12-04T09:28:37.4039742Z 2025-12-04T09:28:37.4041363Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda I1204 09:27:16.690000 47293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 47345 2025-12-04T09:28:37.4041999Z I1204 09:27:16.691000 47293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 47346 2025-12-04T09:28:37.4042480Z I1204 09:27:16.691000 47293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 47347 2025-12-04T09:28:37.4042969Z I1204 09:27:16.692000 47293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 47348 2025-12-04T09:28:37.4045404Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4045527Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4048285Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4048403Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4051208Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4051975Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4054497Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4054629Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4056832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4056982Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4058692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4058819Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4060787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4060919Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4062869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4062998Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4063447Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4063958Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4065065Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4065771Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4066809Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4067149Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4068081Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4068540Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4069514Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4069922Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4070831Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4071205Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4072042Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4072455Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4074038Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:28:37.4074339Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4074897Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4076169Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4076470Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4077085Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4077542Z E1204 09:27:23.320000 47346 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.4077934Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4078381Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4079605Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4080096Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4081056Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4081431Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4082437Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4082937Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4083877Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4084329Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4085268Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4085685Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4086632Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4087093Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4088867Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:28:37.4089217Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4089913Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4091259Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4091672Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4092289Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4092748Z E1204 09:27:23.322000 47345 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.4093124Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4093842Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4094815Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4095299Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4096259Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4096675Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4097608Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4098097Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4099038Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4099499Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4100443Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4100865Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4101809Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4102273Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4104040Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 422510592 and is now 613351424. 2025-12-04T09:28:37.4104387Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4105080Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4106498Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4106810Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4107471Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4107956Z E1204 09:27:23.322000 47348 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.4108352Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4108838Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4109822Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4110254Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4111104Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4111462Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4112317Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4112718Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4113551Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4113951Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4114791Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4115157Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4115977Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4116398Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4117962Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.4118320Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4118881Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4120067Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4120362Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4120969Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4121443Z E1204 09:27:23.329000 47347 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.4121532Z FAILED [8.5247s] [ 50%] 2025-12-04T09:28:37.4121538Z 2025-12-04T09:28:37.4121683Z =================================== FAILURES =================================== 2025-12-04T09:28:37.4122088Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.4122197Z Traceback (most recent call last): 2025-12-04T09:28:37.4122692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.4122793Z self._join_processes(fn) 2025-12-04T09:28:37.4123322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.4123478Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.4124014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.4124164Z raise RuntimeError(error) 2025-12-04T09:28:37.4124372Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.4124479Z Traceback (most recent call last): 2025-12-04T09:28:37.4124964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4125063Z getattr(self, test_name)() 2025-12-04T09:28:37.4125542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4125623Z fn() 2025-12-04T09:28:37.4126070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4126172Z method(*args, **kwargs) 2025-12-04T09:28:37.4126620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4126716Z method(*args, **kwargs) 2025-12-04T09:28:37.4127165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4127248Z with policy(): 2025-12-04T09:28:37.4127707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4127799Z raise RuntimeError(msg) 2025-12-04T09:28:37.4128988Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 422510592 and is now 613351424. 2025-12-04T09:28:37.4129003Z 2025-12-04T09:28:37.4129197Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4130050Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4130058Z 2025-12-04T09:28:37.4130299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4130303Z 2025-12-04T09:28:37.4130307Z 2025-12-04T09:28:37.4130501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.4130741Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.4131564Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ab2eda46e6c1c6d0.xml - 2025-12-04T09:28:37.4131714Z =========================== short test summary info ============================ 2025-12-04T09:28:37.4132666Z FAILED [8.5247s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.4132771Z Traceback (most recent call last): 2025-12-04T09:28:37.4133332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4133431Z getattr(self, test_name)() 2025-12-04T09:28:37.4134110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4134215Z fn() 2025-12-04T09:28:37.4134717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4134868Z method(*args, **kwargs) 2025-12-04T09:28:37.4135370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4135479Z method(*args, **kwargs) 2025-12-04T09:28:37.4136020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4136115Z with policy(): 2025-12-04T09:28:37.4136621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4136740Z raise RuntimeError(msg) 2025-12-04T09:28:37.4138093Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 422510592 and is now 613351424. 2025-12-04T09:28:37.4138102Z 2025-12-04T09:28:37.4138324Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4139231Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4139238Z 2025-12-04T09:28:37.4139511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4139684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.4139855Z ======================= 1 failed, 13 deselected in 8.74s ======================= 2025-12-04T09:28:37.4139961Z Got exit code 1 2025-12-04T09:28:37.4140065Z Retrying single test... 2025-12-04T09:28:37.4140819Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7d4e43b394d06af0.xml 2025-12-04T09:28:37.4140989Z ============================= test session starts ============================== 2025-12-04T09:28:37.4141396Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.4141513Z cachedir: .pytest_cache 2025-12-04T09:28:37.4142025Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.4142147Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.4142259Z configfile: pytest.ini 2025-12-04T09:28:37.4142791Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.4144052Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4144195Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.4145424Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4145698Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.4145839Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.4146835Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4146931Z Running 1 items in this shard 2025-12-04T09:28:37.4146935Z 2025-12-04T09:28:37.4148047Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda I1204 09:27:29.959000 47626 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 47678 2025-12-04T09:28:37.4148534Z I1204 09:27:29.960000 47626 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 47679 2025-12-04T09:28:37.4148997Z I1204 09:27:29.961000 47626 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 47680 2025-12-04T09:28:37.4149437Z I1204 09:27:29.962000 47626 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 47681 2025-12-04T09:28:37.4151558Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4151674Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4153978Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4154095Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4156748Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4156867Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4159111Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4159212Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4160836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4160960Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4162771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4162892Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4164546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4164722Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4166383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4166501Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4166920Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4167420Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4168365Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4168839Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4169769Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4170125Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4171030Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4171527Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4172436Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4172877Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4174028Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4174444Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4175381Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4175856Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4177628Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 722403328. 2025-12-04T09:28:37.4177975Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4178829Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4180195Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4180596Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4181284Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4181811Z E1204 09:27:36.774000 47678 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.4182232Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4182750Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4183726Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4184213Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4185170Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4185538Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4186483Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4187009Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4187954Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4188414Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4189347Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4189766Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4190803Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4191262Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4193049Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.4193416Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4194010Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4195299Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4195612Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4196256Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4196751Z E1204 09:27:36.775000 47680 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.4197153Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4197644Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4198619Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4199043Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4199901Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4200229Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4201117Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4201698Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4202579Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4203010Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4203883Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4204292Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4205167Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4205614Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4207287Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.4215471Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4216198Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4217642Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4217981Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4218675Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4219194Z E1204 09:27:36.778000 47681 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.4219629Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4220132Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4221108Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4221589Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4222546Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4222923Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4223989Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4224465Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4225393Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4225942Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4226826Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4227314Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4228158Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4228566Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4230154Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 523173888 and is now 611254272. 2025-12-04T09:28:37.4230488Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4231072Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4232256Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4232551Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4233175Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4233641Z E1204 09:27:36.780000 47679 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.4233730Z FAILED [8.7528s] [100%] 2025-12-04T09:28:37.4233748Z 2025-12-04T09:28:37.4233876Z =================================== FAILURES =================================== 2025-12-04T09:28:37.4234284Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.4234404Z Traceback (most recent call last): 2025-12-04T09:28:37.4234888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.4234992Z self._join_processes(fn) 2025-12-04T09:28:37.4235518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.4235645Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.4236254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.4236361Z raise RuntimeError(error) 2025-12-04T09:28:37.4236566Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.4236685Z Traceback (most recent call last): 2025-12-04T09:28:37.4237166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4237263Z getattr(self, test_name)() 2025-12-04T09:28:37.4237740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4237818Z fn() 2025-12-04T09:28:37.4238450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4238552Z method(*args, **kwargs) 2025-12-04T09:28:37.4239025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4239131Z method(*args, **kwargs) 2025-12-04T09:28:37.4239606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4239693Z with policy(): 2025-12-04T09:28:37.4240178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4240383Z raise RuntimeError(msg) 2025-12-04T09:28:37.4241588Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.4241622Z 2025-12-04T09:28:37.4241814Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4242629Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4242660Z 2025-12-04T09:28:37.4242891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4242896Z 2025-12-04T09:28:37.4242900Z 2025-12-04T09:28:37.4243092Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.4243336Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.4244161Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7d4e43b394d06af0.xml - 2025-12-04T09:28:37.4244326Z =========================== short test summary info ============================ 2025-12-04T09:28:37.4245274Z FAILED [8.7528s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.4245380Z Traceback (most recent call last): 2025-12-04T09:28:37.4245868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4245970Z getattr(self, test_name)() 2025-12-04T09:28:37.4246634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4246719Z fn() 2025-12-04T09:28:37.4247195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4247301Z method(*args, **kwargs) 2025-12-04T09:28:37.4247771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4247923Z method(*args, **kwargs) 2025-12-04T09:28:37.4248400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4248495Z with policy(): 2025-12-04T09:28:37.4248975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4249074Z raise RuntimeError(msg) 2025-12-04T09:28:37.4250337Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.4250356Z 2025-12-04T09:28:37.4250557Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4251409Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4251417Z 2025-12-04T09:28:37.4251675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4251837Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.4252007Z ======================= 1 failed, 14 deselected in 8.97s ======================= 2025-12-04T09:28:37.4252099Z Got exit code 1 2025-12-04T09:28:37.4252198Z Retrying single test... 2025-12-04T09:28:37.4252925Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-d04e17113c0af8ba.xml 2025-12-04T09:28:37.4253106Z ============================= test session starts ============================== 2025-12-04T09:28:37.4253524Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.4253806Z cachedir: .pytest_cache 2025-12-04T09:28:37.4254322Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.4254512Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.4254619Z configfile: pytest.ini 2025-12-04T09:28:37.4255158Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.4256418Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4256555Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.4257782Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4257938Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.4258089Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.4259088Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4259198Z Running 1 items in this shard 2025-12-04T09:28:37.4259204Z 2025-12-04T09:28:37.4260465Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda I1204 09:27:43.299000 47959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 48011 2025-12-04T09:28:37.4261022Z I1204 09:27:43.300000 47959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 48012 2025-12-04T09:28:37.4261514Z I1204 09:27:43.301000 47959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 48013 2025-12-04T09:28:37.4262017Z I1204 09:27:43.302000 47959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 48014 2025-12-04T09:28:37.4264402Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4264525Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4266989Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4267097Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4269184Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4269316Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4271439Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:80: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4271535Z FSDP.set_state_dict_type( 2025-12-04T09:28:37.4273067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4273186Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4274713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4274824Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4276332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4276503Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4278194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:28:37.4278319Z device = _get_pg_default_device(group) 2025-12-04T09:28:37.4278871Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4279549Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4280531Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4281010Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4281981Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4282347Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4283285Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4283819Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4284768Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4285267Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4286202Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4286631Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4287565Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4288036Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4289801Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 638517248 and is now 722403328. 2025-12-04T09:28:37.4290141Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4290769Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4292253Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4292583Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4293289Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4293788Z E1204 09:27:50.060000 48011 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.4294375Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4294885Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4295870Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4296352Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4297328Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4297696Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4298664Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4299123Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4300083Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4300548Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4301477Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4301905Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4302836Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4303306Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4305082Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 518979584 and is now 611254272. 2025-12-04T09:28:37.4305424Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4306149Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4307563Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4307889Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4308535Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4309028Z E1204 09:27:50.061000 48012 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.4309595Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4310084Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4311034Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4311494Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4312433Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4312819Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4313732Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4314199Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4315103Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4315553Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4316545Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4316948Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4317828Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4318269Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4319991Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 529465344 and is now 611254272. 2025-12-04T09:28:37.4320291Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4320911Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4322091Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4322394Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4322998Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4323466Z E1204 09:27:50.063000 48013 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.4323842Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4324292Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4325158Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4325582Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4326436Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4326790Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4327616Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4328056Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4328876Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4329288Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4330109Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4330487Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4331311Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4331718Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4333355Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.4333904Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4334543Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4335881Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4336226Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4336906Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4337425Z E1204 09:27:50.066000 48014 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.4337535Z FAILED [8.6002s] [100%] 2025-12-04T09:28:37.4337544Z 2025-12-04T09:28:37.4337687Z =================================== FAILURES =================================== 2025-12-04T09:28:37.4338155Z _ TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda _ 2025-12-04T09:28:37.4338277Z Traceback (most recent call last): 2025-12-04T09:28:37.4338819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.4338943Z self._join_processes(fn) 2025-12-04T09:28:37.4339522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.4339706Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.4340313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.4340430Z raise RuntimeError(error) 2025-12-04T09:28:37.4340727Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.4340848Z Traceback (most recent call last): 2025-12-04T09:28:37.4341380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4341499Z getattr(self, test_name)() 2025-12-04T09:28:37.4342033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4342127Z fn() 2025-12-04T09:28:37.4342630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4342739Z method(*args, **kwargs) 2025-12-04T09:28:37.4343253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4343360Z method(*args, **kwargs) 2025-12-04T09:28:37.4343864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4343966Z with policy(): 2025-12-04T09:28:37.4344469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4344583Z raise RuntimeError(msg) 2025-12-04T09:28:37.4346125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 638517248 and is now 722403328. 2025-12-04T09:28:37.4346134Z 2025-12-04T09:28:37.4346332Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4347183Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4347191Z 2025-12-04T09:28:37.4347420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4347425Z 2025-12-04T09:28:37.4347429Z 2025-12-04T09:28:37.4347637Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.4347867Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.4348702Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-d04e17113c0af8ba.xml - 2025-12-04T09:28:37.4348854Z =========================== short test summary info ============================ 2025-12-04T09:28:37.4349800Z FAILED [8.6002s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.4349919Z Traceback (most recent call last): 2025-12-04T09:28:37.4350398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4350507Z getattr(self, test_name)() 2025-12-04T09:28:37.4350981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4351060Z fn() 2025-12-04T09:28:37.4351519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4351639Z method(*args, **kwargs) 2025-12-04T09:28:37.4352094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4352184Z method(*args, **kwargs) 2025-12-04T09:28:37.4352631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4352751Z with policy(): 2025-12-04T09:28:37.4353201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4353299Z raise RuntimeError(msg) 2025-12-04T09:28:37.4354502Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 638517248 and is now 722403328. 2025-12-04T09:28:37.4354510Z 2025-12-04T09:28:37.4354700Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4355518Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4355524Z 2025-12-04T09:28:37.4355759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4355925Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.4356076Z ======================= 1 failed, 14 deselected in 8.81s ======================= 2025-12-04T09:28:37.4356164Z Got exit code 1 2025-12-04T09:28:37.4356910Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda 2025-12-04T09:28:37.4357274Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.4357943Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a3b3387bd6019536.xml 2025-12-04T09:28:37.4358146Z ============================= test session starts ============================== 2025-12-04T09:28:37.4358454Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.4358556Z cachedir: .pytest_cache 2025-12-04T09:28:37.4359013Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.4359119Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.4359221Z configfile: pytest.ini 2025-12-04T09:28:37.4359696Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.4360811Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4360937Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.4362017Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4362162Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.4362293Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.4362421Z stepcurrent: skipping 14 already run items. 2025-12-04T09:28:37.4362522Z Running 1 items in this shard 2025-12-04T09:28:37.4362527Z 2025-12-04T09:28:37.4363542Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda I1204 09:27:56.589000 48292 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 48344 2025-12-04T09:28:37.4364016Z I1204 09:27:56.590000 48292 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 48345 2025-12-04T09:28:37.4364480Z I1204 09:27:56.591000 48292 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 48346 2025-12-04T09:28:37.4364919Z I1204 09:27:56.592000 48292 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 48347 2025-12-04T09:28:37.4367142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4367405Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4369611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4369862Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4372106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4372354Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4374902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4375184Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4375618Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4376128Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4377110Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4377589Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4378558Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4379147Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4380076Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4380618Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4381549Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4382019Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4382952Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4383382Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4384311Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4384770Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4386437Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 506396672 and is now 615448576. 2025-12-04T09:28:37.4386851Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4387493Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4388678Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4389019Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4389705Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4390225Z E1204 09:28:03.351000 48347 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.4390752Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4391200Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4392068Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4392491Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4393343Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4393721Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4394579Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4394994Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4395819Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4396228Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4397058Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4397429Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4398264Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4398674Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4400192Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:28:37.4400493Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4401058Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4402266Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4402576Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4403386Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4403871Z E1204 09:28:03.351000 48346 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.4404453Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4404934Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4405881Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4406346Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4407335Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4407701Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4408632Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4409085Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4410007Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4410458Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4411362Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4411767Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4412675Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4413123Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4415062Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 720306176. 2025-12-04T09:28:37.4415403Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4416036Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4417221Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4417552Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4418250Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4418770Z E1204 09:28:03.352000 48344 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.4419201Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4419709Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4420673Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4421161Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4422147Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4422554Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4423507Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4423972Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4424902Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4425358Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4426318Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4426684Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4427521Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4427928Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4429695Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:28:37.4430009Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4430597Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4431719Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4432032Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4432687Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4433173Z E1204 09:28:03.354000 48345 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.4433322Z FAILED [8.5905s] [100%] 2025-12-04T09:28:37.4433330Z 2025-12-04T09:28:37.4433478Z =================================== FAILURES =================================== 2025-12-04T09:28:37.4433799Z ___ TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda ____ 2025-12-04T09:28:37.4433913Z Traceback (most recent call last): 2025-12-04T09:28:37.4434430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.4434536Z self._join_processes(fn) 2025-12-04T09:28:37.4435292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.4435436Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.4436017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.4436155Z raise RuntimeError(error) 2025-12-04T09:28:37.4436389Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.4436501Z Traceback (most recent call last): 2025-12-04T09:28:37.4437029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4437134Z getattr(self, test_name)() 2025-12-04T09:28:37.4437646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4437745Z fn() 2025-12-04T09:28:37.4438231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4438330Z method(*args, **kwargs) 2025-12-04T09:28:37.4438827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4438928Z method(*args, **kwargs) 2025-12-04T09:28:37.4439416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4439509Z with policy(): 2025-12-04T09:28:37.4439995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4440109Z raise RuntimeError(msg) 2025-12-04T09:28:37.4441292Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 720306176. 2025-12-04T09:28:37.4441302Z 2025-12-04T09:28:37.4441652Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4442392Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4442399Z 2025-12-04T09:28:37.4442656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4442670Z 2025-12-04T09:28:37.4442675Z 2025-12-04T09:28:37.4442886Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.4443138Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.4444043Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a3b3387bd6019536.xml - 2025-12-04T09:28:37.4444206Z =========================== short test summary info ============================ 2025-12-04T09:28:37.4445125Z FAILED [8.5905s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:37.4445243Z Traceback (most recent call last): 2025-12-04T09:28:37.4445769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4445881Z getattr(self, test_name)() 2025-12-04T09:28:37.4446394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4446480Z fn() 2025-12-04T09:28:37.4446982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4447108Z method(*args, **kwargs) 2025-12-04T09:28:37.4447601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4447701Z method(*args, **kwargs) 2025-12-04T09:28:37.4448209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4448311Z with policy(): 2025-12-04T09:28:37.4448804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4448907Z raise RuntimeError(msg) 2025-12-04T09:28:37.4450095Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 628031488 and is now 720306176. 2025-12-04T09:28:37.4450104Z 2025-12-04T09:28:37.4450310Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4451069Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4451076Z 2025-12-04T09:28:37.4451330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4451512Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.4451679Z ======================= 1 failed, 14 deselected in 8.81s ======================= 2025-12-04T09:28:37.4451773Z Got exit code 1 2025-12-04T09:28:37.4451881Z Retrying single test... 2025-12-04T09:28:37.4452607Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-06677f872919b29b.xml 2025-12-04T09:28:37.4452759Z ============================= test session starts ============================== 2025-12-04T09:28:37.4453099Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.4453268Z cachedir: .pytest_cache 2025-12-04T09:28:37.4454016Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.4454140Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.4454243Z configfile: pytest.ini 2025-12-04T09:28:37.4454784Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.4456038Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4456179Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.4457396Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4457550Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.4457700Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.4458543Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4458668Z Running 1 items in this shard 2025-12-04T09:28:37.4458673Z 2025-12-04T09:28:37.4459803Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda I1204 09:28:09.930000 48625 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 48677 2025-12-04T09:28:37.4460325Z I1204 09:28:09.931000 48625 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 48678 2025-12-04T09:28:37.4460825Z I1204 09:28:09.931000 48625 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 48679 2025-12-04T09:28:37.4461339Z I1204 09:28:09.932000 48625 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 48680 2025-12-04T09:28:37.4463863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4464137Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4466822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4467075Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4469483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4469724Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4471923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4472158Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4472546Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4472995Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4473854Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4474278Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4475130Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4475495Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4476321Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4476761Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4477581Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4477983Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4478956Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4479506Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4480455Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4480912Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4482572Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 636420096 and is now 726597632. 2025-12-04T09:28:37.4482907Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4483633Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4484839Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4485170Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4485865Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4486380Z E1204 09:28:16.660000 48677 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.4486808Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4487319Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4488285Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4488772Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4489721Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4490139Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4491075Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4491680Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4492508Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4492913Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4493998Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4494413Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4495359Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4495813Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4497464Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.4497885Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4498514Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4499709Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4500037Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4503036Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4503593Z E1204 09:28:16.663000 48679 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.4504023Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4504533Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4505505Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4506069Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4506946Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4507320Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4508148Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4508583Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4509403Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4509812Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4510639Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4511019Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4511848Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4512258Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4513711Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:28:37.4514041Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4514604Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4515657Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4515956Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4516612Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4517076Z E1204 09:28:16.664000 48678 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.4517448Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4517891Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4518748Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4519171Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4520025Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4520769Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4521623Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4522022Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4522841Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4523256Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4524080Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4524457Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4525282Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4525685Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4527151Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.4527471Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4528032Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4529082Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4529382Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4530032Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4530491Z E1204 09:28:16.670000 48680 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.4530592Z FAILED [8.6342s] [100%] 2025-12-04T09:28:37.4530599Z 2025-12-04T09:28:37.4530726Z =================================== FAILURES =================================== 2025-12-04T09:28:37.4531027Z ___ TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda ____ 2025-12-04T09:28:37.4531132Z Traceback (most recent call last): 2025-12-04T09:28:37.4531611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.4531722Z self._join_processes(fn) 2025-12-04T09:28:37.4532240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.4532401Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.4532934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.4533060Z raise RuntimeError(error) 2025-12-04T09:28:37.4533365Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.4533471Z Traceback (most recent call last): 2025-12-04T09:28:37.4534162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4534282Z getattr(self, test_name)() 2025-12-04T09:28:37.4534813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4534909Z fn() 2025-12-04T09:28:37.4535415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4535520Z method(*args, **kwargs) 2025-12-04T09:28:37.4536030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4536133Z method(*args, **kwargs) 2025-12-04T09:28:37.4536636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4536742Z with policy(): 2025-12-04T09:28:37.4537250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4537365Z raise RuntimeError(msg) 2025-12-04T09:28:37.4538583Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.4538595Z 2025-12-04T09:28:37.4538808Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4539615Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4539624Z 2025-12-04T09:28:37.4539888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4539893Z 2025-12-04T09:28:37.4539898Z 2025-12-04T09:28:37.4540122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.4540383Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.4541310Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-06677f872919b29b.xml - 2025-12-04T09:28:37.4541518Z =========================== short test summary info ============================ 2025-12-04T09:28:37.4542463Z FAILED [8.6342s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:28:37.4542590Z Traceback (most recent call last): 2025-12-04T09:28:37.4543134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4543249Z getattr(self, test_name)() 2025-12-04T09:28:37.4543781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4543866Z fn() 2025-12-04T09:28:37.4544374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4544480Z method(*args, **kwargs) 2025-12-04T09:28:37.4545029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4545139Z method(*args, **kwargs) 2025-12-04T09:28:37.4545748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4545990Z with policy(): 2025-12-04T09:28:37.4546462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4546561Z raise RuntimeError(msg) 2025-12-04T09:28:37.4547716Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:28:37.4547722Z 2025-12-04T09:28:37.4547919Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4548745Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4548750Z 2025-12-04T09:28:37.4548987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4549145Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.4549307Z ======================= 1 failed, 14 deselected in 8.85s ======================= 2025-12-04T09:28:37.4549391Z Got exit code 1 2025-12-04T09:28:37.4549486Z Retrying single test... 2025-12-04T09:28:37.4550153Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-da2c1a3b7d1cdaf6.xml 2025-12-04T09:28:37.4550293Z ============================= test session starts ============================== 2025-12-04T09:28:37.4550607Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.4550704Z cachedir: .pytest_cache 2025-12-04T09:28:37.4551186Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.4551300Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.4551389Z configfile: pytest.ini 2025-12-04T09:28:37.4551866Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.4552983Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4553098Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.4554214Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4554353Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.4554491Z collected 15 items / 14 deselected / 1 selected 2025-12-04T09:28:37.4555243Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4555339Z Running 1 items in this shard 2025-12-04T09:28:37.4555344Z 2025-12-04T09:28:37.4556351Z distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda I1204 09:28:23.199000 48958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 49010 2025-12-04T09:28:37.4556791Z I1204 09:28:23.200000 48958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 49011 2025-12-04T09:28:37.4557262Z I1204 09:28:23.201000 48958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 49012 2025-12-04T09:28:37.4557692Z I1204 09:28:23.202000 48958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 49013 2025-12-04T09:28:37.4559954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4560195Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4562412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4562648Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4564850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4565110Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4567328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:28:37.4567562Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:28:37.4567974Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4568429Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4569291Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4569720Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4570565Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4570891Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4571741Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4572145Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4572995Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4573461Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4574541Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4574958Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4575896Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4576354Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4578000Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.4578348Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4579254Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4580463Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4580799Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4581486Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4581998Z E1204 09:28:29.921000 49013 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:37.4582469Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4582978Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4583947Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4584428Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4585385Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4585748Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4586720Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4587213Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4588150Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4588600Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4589532Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4589947Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4591057Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4591471Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4592934Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.4593237Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4593821Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4594888Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4595178Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4595781Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4596287Z E1204 09:28:29.921000 49011 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:37.4596662Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4597117Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4597973Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4598398Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4599236Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4599590Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4600423Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4600851Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4601681Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4602080Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4602904Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4603280Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4604100Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4604512Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4605967Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 0. CUDA driver allocated memory was 640614400 and is now 720306176. 2025-12-04T09:28:37.4606269Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4606853Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4607918Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4608210Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4608816Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4609309Z E1204 09:28:29.921000 49010 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:37.4609684Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:37.4610135Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:37.4610991Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4611409Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:37.4612268Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4612617Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:37.4613512Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4614158Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4615091Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4615541Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:37.4616467Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4616890Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:37.4617819Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4618287Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:37.4619927Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 2. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:28:37.4620265Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4620920Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4622112Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4622448Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:37.4623127Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4623685Z E1204 09:28:29.923000 49012 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:37.4623784Z FAILED [8.6011s] [100%] 2025-12-04T09:28:37.4623792Z 2025-12-04T09:28:37.4623936Z =================================== FAILURES =================================== 2025-12-04T09:28:37.4624281Z ___ TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda ____ 2025-12-04T09:28:37.4624400Z Traceback (most recent call last): 2025-12-04T09:28:37.4624943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:37.4625052Z self._join_processes(fn) 2025-12-04T09:28:37.4625740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:37.4625885Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:37.4626560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:37.4626666Z raise RuntimeError(error) 2025-12-04T09:28:37.4626875Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.4627007Z Traceback (most recent call last): 2025-12-04T09:28:37.4627490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4627587Z getattr(self, test_name)() 2025-12-04T09:28:37.4628052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4628136Z fn() 2025-12-04T09:28:37.4628584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4628681Z method(*args, **kwargs) 2025-12-04T09:28:37.4629130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4629223Z method(*args, **kwargs) 2025-12-04T09:28:37.4629672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4629769Z with policy(): 2025-12-04T09:28:37.4630220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4630321Z raise RuntimeError(msg) 2025-12-04T09:28:37.4631401Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.4631407Z 2025-12-04T09:28:37.4631604Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4632297Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4632302Z 2025-12-04T09:28:37.4632575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4632582Z 2025-12-04T09:28:37.4632586Z 2025-12-04T09:28:37.4632779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:37.4633011Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:37.4633854Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-da2c1a3b7d1cdaf6.xml - 2025-12-04T09:28:37.4634005Z =========================== short test summary info ============================ 2025-12-04T09:28:37.4634865Z FAILED [8.6011s] distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:37.4634977Z Traceback (most recent call last): 2025-12-04T09:28:37.4635463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:37.4635571Z getattr(self, test_name)() 2025-12-04T09:28:37.4636047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:37.4636136Z fn() 2025-12-04T09:28:37.4636584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4636677Z method(*args, **kwargs) 2025-12-04T09:28:37.4637133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:37.4637253Z method(*args, **kwargs) 2025-12-04T09:28:37.4637696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:37.4637793Z with policy(): 2025-12-04T09:28:37.4638245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:37.4638375Z raise RuntimeError(msg) 2025-12-04T09:28:37.4639455Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda! Caching allocator allocated memory was 0 and is now reported as 7680 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:28:37.4639460Z 2025-12-04T09:28:37.4639652Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:37.4640343Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_dtensor_state_dict.py TestFSDPWithDeviceMeshAndDTensorCUDA.test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4640350Z 2025-12-04T09:28:37.4640582Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:37.4640746Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:37.4640900Z ======================= 1 failed, 14 deselected in 8.82s ======================= 2025-12-04T09:28:37.4640988Z Got exit code 1 2025-12-04T09:28:37.4641606Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda 2025-12-04T09:28:37.4641967Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:37.4642639Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-467d89e082f97fc4.xml 2025-12-04T09:28:37.4642787Z ============================= test session starts ============================== 2025-12-04T09:28:37.4643093Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:37.4643193Z cachedir: .pytest_cache 2025-12-04T09:28:37.4643689Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:37.4643805Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:37.4643899Z configfile: pytest.ini 2025-12-04T09:28:37.4644370Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:37.4645487Z collecting ... /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:31: PytestCollectionWarning: cannot collect test class 'TestDummyModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4645636Z class TestDummyModel(torch.nn.Module): 2025-12-04T09:28:37.4646731Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_dtensor_state_dict.py:47: PytestCollectionWarning: cannot collect test class 'TestDummyModelUneven' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_dtensor_state_dict.py) 2025-12-04T09:28:37.4646871Z class TestDummyModelUneven(torch.nn.Module): 2025-12-04T09:28:37.4647002Z collected 15 items / 15 deselected / 0 selected 2025-12-04T09:28:37.4647134Z stepcurrent: skipping 15 already run items. 2025-12-04T09:28:37.4647232Z Running 0 items in this shard 2025-12-04T09:28:37.4647237Z 2025-12-04T09:28:37.4648061Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-467d89e082f97fc4.xml - 2025-12-04T09:28:37.4648216Z ============================ 15 deselected in 0.02s ============================ 2025-12-04T09:28:37.4661557Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_False_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_is_even_sharded_model_True_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_False_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_is_even_sharded_model_True_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_False_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_is_even_sharded_model_True_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_False_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_is_even_sharded_model_True_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_False_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_fsdp_init_with_device_mesh_is_even_sharded_model_True_cuda', 'test/distributed/fsdp/test_fsdp_dtensor_state_dict.py::TestFSDPWithDeviceMeshAndDTensorCUDA::test_raises_warning_or_errors_cuda'] 2025-12-04T09:28:37.4661619Z 2025-12-04T09:28:37.4662354Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_fsdp_dtensor_state_dict_1.1_e652baa949161530_.log) 2025-12-04T09:28:37.4662362Z 2025-12-04T09:28:37.4662838Z Finished distributed/fsdp/test_fsdp_dtensor_state_dict 1/1 ... [2025-12-04 09:28:36.806224][2142.908350115], took 10.26min 2025-12-04T09:28:37.4663833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-dcb6c7b6743de89e.xml 2025-12-04T09:28:37.4664845Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c0871d667bd4df8d.xml 2025-12-04T09:28:37.4665825Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b1a99f4c33297699.xml 2025-12-04T09:28:37.4666804Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7332aead750b9bce.xml 2025-12-04T09:28:37.4667708Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c7d658062419b597.xml 2025-12-04T09:28:37.4668602Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-348cc3a828a50222.xml 2025-12-04T09:28:37.4669479Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bb573131fa19ab29.xml 2025-12-04T09:28:37.4670343Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-16546bb6943a3c11.xml 2025-12-04T09:28:37.4671215Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-cdd2e74ccc0956b9.xml 2025-12-04T09:28:37.4672083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9650dbe5a6e76fd8.xml 2025-12-04T09:28:37.4672943Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-53bea78db525054e.xml 2025-12-04T09:28:37.4673813Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-470dd7f8801a129e.xml 2025-12-04T09:28:37.4674672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-864712a0594b6ca2.xml 2025-12-04T09:28:37.4675567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c88c69879eff0a17.xml 2025-12-04T09:28:37.4676432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ecd21e7500304b9f.xml 2025-12-04T09:28:37.4677302Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-e7cfa143d1c9be09.xml 2025-12-04T09:28:37.4678184Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-976f30802ad214bb.xml 2025-12-04T09:28:37.4679386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d8b05be053af669.xml 2025-12-04T09:28:37.4718045Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ceb5badd22358e55.xml 2025-12-04T09:28:37.5047766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-f5dcf7c66579f3c2.xml 2025-12-04T09:28:37.5326057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-21e2c8920cf3865d.xml 2025-12-04T09:28:37.5628564Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-3dd1dab0649736e8.xml 2025-12-04T09:28:37.5907255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-255818cdbe5fbd05.xml 2025-12-04T09:28:37.6183135Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2340b7a625d10704.xml 2025-12-04T09:28:37.6565580Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-6880e02fcbe22f17.xml 2025-12-04T09:28:37.6925276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a497c1942163e16f.xml 2025-12-04T09:28:37.7245894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a4ee3bf5f7a9a01f.xml 2025-12-04T09:28:37.7527316Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d9ebf91db9daa02.xml 2025-12-04T09:28:37.7815557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-04afe2c287023adc.xml 2025-12-04T09:28:37.8176335Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-80cc64b9f2eb85b8.xml 2025-12-04T09:28:37.8507845Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c38510c20e07f456.xml 2025-12-04T09:28:37.8806388Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bec03360e514672a.xml 2025-12-04T09:28:37.9068118Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7929a6c5753a5bf7.xml 2025-12-04T09:28:37.9335997Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b0deb68b75574955.xml 2025-12-04T09:28:37.9638874Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-eda6c23a06d3c574.xml 2025-12-04T09:28:37.9947519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2e17c39fb483ae46.xml 2025-12-04T09:28:38.0237884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9ad7d4c20da7406b.xml 2025-12-04T09:28:38.0566203Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-5b391720a035fce0.xml 2025-12-04T09:28:38.0883606Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7e0ad2dc0411fa40.xml 2025-12-04T09:28:38.1208472Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ab2eda46e6c1c6d0.xml 2025-12-04T09:28:38.1514573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7d4e43b394d06af0.xml 2025-12-04T09:28:38.1956131Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-d04e17113c0af8ba.xml 2025-12-04T09:28:38.2207485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a3b3387bd6019536.xml 2025-12-04T09:28:38.2528116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-06677f872919b29b.xml 2025-12-04T09:28:38.2816265Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-da2c1a3b7d1cdaf6.xml 2025-12-04T09:28:38.3106735Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-467d89e082f97fc4.xml 2025-12-04T09:28:38.6722235Z Uploading logs for 57116084892 to S3 2025-12-04T09:28:38.7824787Z Uploading artifacts took 0.44 seconds 2025-12-04T09:28:38.7825392Z distributed/fsdp/test_fsdp_dtensor_state_dict 1/1 failed! 2025-12-04T09:28:38.7827269Z Running distributed/fsdp/test_fsdp_core 1/2 ... [2025-12-04 09:28:38.782589][2144.884719288] 2025-12-04T09:28:38.7827841Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:28:38.7831459Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:28:38.782931] 2025-12-04T10:13:47.5546102Z 2025-12-04T10:13:47.5547473Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 1/2 (test/test-reports/distributed.fsdp.test_fsdp_core_1.2_d577d9d07b48d18d_.log) 2025-12-04T10:13:47.5549698Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4e48aa8d10589348.xml 2025-12-04T10:13:47.5551233Z ============================= test session starts ============================== 2025-12-04T10:13:47.5552561Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.5553555Z cachedir: .pytest_cache 2025-12-04T10:13:47.5554720Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.5555974Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.5556565Z configfile: pytest.ini 2025-12-04T10:13:47.5557752Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.5559042Z collecting ... collected 60 items 2025-12-04T10:13:47.5559698Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:13:47.5596062Z Running 33 items in this shard: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:47.5631685Z 2025-12-04T10:13:47.5633399Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda I1204 09:28:42.290000 49347 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 49399 2025-12-04T10:13:47.5636210Z I1204 09:28:42.291000 49347 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 49400 2025-12-04T10:13:47.5638085Z I1204 09:28:42.291000 49347 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 49401 2025-12-04T10:13:47.5639990Z I1204 09:28:42.292000 49347 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 49402 2025-12-04T10:13:47.5643083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.5645553Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.5647976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.5650411Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.5653003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.5655742Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.5659085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.5662507Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.5666096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.5669407Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.5672744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.5677960Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.5681126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.5683734Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.5687089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.5690536Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.5691961Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.5694185Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.5697056Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.5699916Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.5702732Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.5705337Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.5707959Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5710786Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5713588Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5716328Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5719064Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.5721694Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.5724407Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.5727124Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.5731118Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 716111872 and is now 743374848. 2025-12-04T10:13:47.5734851Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5736876Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.5740135Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.5742929Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5745042Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.5747635Z [rank0]:E1204 09:28:49.125000 49399 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.5749516Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.5751480Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.5754271Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.5756972Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.5759732Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.5762253Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.5764769Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5767396Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5770056Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5772896Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5775767Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.5778447Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.5781255Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.5783970Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.5787983Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 609157120 and is now 634322944. 2025-12-04T10:13:47.5791761Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5793692Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.5796885Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.5799570Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5801644Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.5803944Z [rank3]:E1204 09:28:49.125000 49402 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.5805881Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.5807754Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.5810488Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.5813237Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.5816213Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.5818831Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.5821405Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5824110Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5826846Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5829651Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5832452Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.5835000Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.5837550Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.5840199Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.5843969Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 611254272 and is now 634322944. 2025-12-04T10:13:47.5847493Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5849416Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.5852585Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.5855632Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5857739Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.5860177Z [rank2]:E1204 09:28:49.126000 49401 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.5862146Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.5864079Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.5867014Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.5869739Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.5872481Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.5875011Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.5877493Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5926718Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5929855Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5932798Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.5936058Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.5938998Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.5942064Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.5945098Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.5949411Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T10:13:47.5953367Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5955493Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.5959143Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.5962111Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.5964435Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.5966997Z [rank1]:E1204 09:28:49.127000 49400 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.5968421Z dist init r=2, world=4 2025-12-04T10:13:47.5968889Z dist init r=1, world=4 2025-12-04T10:13:47.5969345Z dist init r=3, world=4 2025-12-04T10:13:47.5969809Z dist init r=0, world=4 2025-12-04T10:13:47.5972214Z [rank0]:[W1204 09:28:49.661114063 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.5975001Z FAILED [8.5203s] [ 3%] 2025-12-04T10:13:47.5975343Z 2025-12-04T10:13:47.5975595Z =================================== FAILURES =================================== 2025-12-04T10:13:47.5976650Z ____ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda ____ 2025-12-04T10:13:47.5977664Z Traceback (most recent call last): 2025-12-04T10:13:47.5979314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.5980779Z self._join_processes(fn) 2025-12-04T10:13:47.5982248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.5983877Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.5985500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.5987256Z raise RuntimeError(error) 2025-12-04T10:13:47.5988062Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.5988939Z Traceback (most recent call last): 2025-12-04T10:13:47.5990467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.5991898Z getattr(self, test_name)() 2025-12-04T10:13:47.5993235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.5994609Z fn() 2025-12-04T10:13:47.5995756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5997228Z method(*args, **kwargs) 2025-12-04T10:13:47.5998484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.5999857Z method(*args, **kwargs) 2025-12-04T10:13:47.6001116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6002468Z with policy(): 2025-12-04T10:13:47.6003670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6005042Z raise RuntimeError(msg) 2025-12-04T10:13:47.6007565Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 611254272 and is now 634322944. 2025-12-04T10:13:47.6010157Z 2025-12-04T10:13:47.6010551Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6012369Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6014210Z 2025-12-04T10:13:47.6014697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6015447Z 2025-12-04T10:13:47.6015455Z 2025-12-04T10:13:47.6015856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.6016969Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.6019196Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4e48aa8d10589348.xml - 2025-12-04T10:13:47.6021254Z =========================== short test summary info ============================ 2025-12-04T10:13:47.6023396Z FAILED [8.5203s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.6025767Z Traceback (most recent call last): 2025-12-04T10:13:47.6027049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6028369Z getattr(self, test_name)() 2025-12-04T10:13:47.6029592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6030855Z fn() 2025-12-04T10:13:47.6031898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6033145Z method(*args, **kwargs) 2025-12-04T10:13:47.6034306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6035564Z method(*args, **kwargs) 2025-12-04T10:13:47.6036716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6038049Z with policy(): 2025-12-04T10:13:47.6039136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6040399Z raise RuntimeError(msg) 2025-12-04T10:13:47.6042608Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 611254272 and is now 634322944. 2025-12-04T10:13:47.6044788Z 2025-12-04T10:13:47.6045119Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6047075Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6048407Z 2025-12-04T10:13:47.6048840Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6049705Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.6050408Z ============================== 1 failed in 8.74s =============================== 2025-12-04T10:13:47.6051047Z Got exit code 1 2025-12-04T10:13:47.6051461Z Retrying single test... 2025-12-04T10:13:47.6052749Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3193e57821c2ebca.xml 2025-12-04T10:13:47.6054573Z ============================= test session starts ============================== 2025-12-04T10:13:47.6055755Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.6056824Z cachedir: .pytest_cache 2025-12-04T10:13:47.6058063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.6059310Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.6059843Z configfile: pytest.ini 2025-12-04T10:13:47.6060988Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.6062470Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.6064214Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6066017Z Running 1 items in this shard 2025-12-04T10:13:47.6066368Z 2025-12-04T10:13:47.6068116Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda I1204 09:28:55.690000 49684 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 49736 2025-12-04T10:13:47.6070946Z I1204 09:28:55.691000 49684 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 49737 2025-12-04T10:13:47.6072651Z I1204 09:28:55.691000 49684 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 49738 2025-12-04T10:13:47.6074506Z I1204 09:28:55.692000 49684 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 49739 2025-12-04T10:13:47.6076872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6079262Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6081519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6083527Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6085041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6086523Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6087980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6089503Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6090967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6092788Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6095908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6097922Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6100157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6103848Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6107674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6111541Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6112846Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6114880Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6117936Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6120980Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6123921Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6126561Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6129235Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6131981Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6135147Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6138241Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6141410Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6144439Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6147485Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6150270Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6154248Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 707723264 and is now 743374848. 2025-12-04T10:13:47.6158020Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6160022Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6163397Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6166190Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6168280Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6170712Z [rank0]:E1204 09:29:02.584000 49736 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.6172674Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6174957Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6178220Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6181653Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6184860Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6187957Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6190961Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6193729Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6196492Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6199348Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6202103Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6204783Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6207487Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6210250Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6214566Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T10:13:47.6218894Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6221235Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6224987Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6227983Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6229970Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6232245Z [rank1]:E1204 09:29:02.587000 49737 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.6234062Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6235884Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6238641Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6241333Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6244129Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6246675Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6249214Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6251774Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6254856Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6257640Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6260005Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6261571Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6263140Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6264737Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6268219Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 609157120 and is now 634322944. 2025-12-04T10:13:47.6272077Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6273845Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6276087Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6278473Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6280739Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6283182Z [rank3]:E1204 09:29:02.589000 49739 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.6284506Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6285651Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6287330Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6289122Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6290782Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6292373Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6294081Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6295722Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6297323Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6298924Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6300517Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6302064Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6303608Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6305253Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6307585Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T10:13:47.6309484Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6310529Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6312222Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6313659Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6314753Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6316004Z [rank2]:E1204 09:29:02.589000 49738 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.6316708Z dist init r=0, world=4 2025-12-04T10:13:47.6316971Z dist init r=1, world=4 2025-12-04T10:13:47.6317222Z dist init r=3, world=4 2025-12-04T10:13:47.6317443Z dist init r=2, world=4 2025-12-04T10:13:47.6318664Z [rank0]:[W1204 09:29:02.098284315 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.6319871Z FAILED [9.1782s] [100%] 2025-12-04T10:13:47.6320024Z 2025-12-04T10:13:47.6320155Z =================================== FAILURES =================================== 2025-12-04T10:13:47.6320656Z ____ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda ____ 2025-12-04T10:13:47.6321133Z Traceback (most recent call last): 2025-12-04T10:13:47.6321814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.6322504Z self._join_processes(fn) 2025-12-04T10:13:47.6323188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.6323977Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.6324754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.6325507Z raise RuntimeError(error) 2025-12-04T10:13:47.6325899Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.6326320Z Traceback (most recent call last): 2025-12-04T10:13:47.6326993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6327682Z getattr(self, test_name)() 2025-12-04T10:13:47.6328335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6328998Z fn() 2025-12-04T10:13:47.6329560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6330240Z method(*args, **kwargs) 2025-12-04T10:13:47.6330852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6331510Z method(*args, **kwargs) 2025-12-04T10:13:47.6332117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6332791Z with policy(): 2025-12-04T10:13:47.6333448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6334349Z raise RuntimeError(msg) 2025-12-04T10:13:47.6335747Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 707723264 and is now 743374848. 2025-12-04T10:13:47.6337068Z 2025-12-04T10:13:47.6337280Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6338286Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6339073Z 2025-12-04T10:13:47.6339340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6339735Z 2025-12-04T10:13:47.6339740Z 2025-12-04T10:13:47.6339959Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.6340571Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.6341764Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3193e57821c2ebca.xml - 2025-12-04T10:13:47.6342851Z =========================== short test summary info ============================ 2025-12-04T10:13:47.6343971Z FAILED [9.1782s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.6345089Z Traceback (most recent call last): 2025-12-04T10:13:47.6346066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6346760Z getattr(self, test_name)() 2025-12-04T10:13:47.6347400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6348063Z fn() 2025-12-04T10:13:47.6348614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6349265Z method(*args, **kwargs) 2025-12-04T10:13:47.6349870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6350550Z method(*args, **kwargs) 2025-12-04T10:13:47.6351160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6351797Z with policy(): 2025-12-04T10:13:47.6352382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6353042Z raise RuntimeError(msg) 2025-12-04T10:13:47.6354265Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 707723264 and is now 743374848. 2025-12-04T10:13:47.6355431Z 2025-12-04T10:13:47.6355615Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6356500Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6357226Z 2025-12-04T10:13:47.6357456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6357967Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.6358411Z ======================= 1 failed, 32 deselected in 9.40s ======================= 2025-12-04T10:13:47.6358769Z Got exit code 1 2025-12-04T10:13:47.6358989Z Retrying single test... 2025-12-04T10:13:47.6359686Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9a7469c5b46925c2.xml 2025-12-04T10:13:47.6360490Z ============================= test session starts ============================== 2025-12-04T10:13:47.6361050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.6361559Z cachedir: .pytest_cache 2025-12-04T10:13:47.6362164Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.6362836Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.6363135Z configfile: pytest.ini 2025-12-04T10:13:47.6363758Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.6364527Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.6365473Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6366331Z Running 1 items in this shard 2025-12-04T10:13:47.6366511Z 2025-12-04T10:13:47.6367427Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda I1204 09:29:09.650000 50021 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 50073 2025-12-04T10:13:47.6368879Z I1204 09:29:09.651000 50021 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 50074 2025-12-04T10:13:47.6369907Z I1204 09:29:09.651000 50021 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 50075 2025-12-04T10:13:47.6370895Z I1204 09:29:09.652000 50021 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 50076 2025-12-04T10:13:47.6372547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6374126Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6376101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6378085Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6379813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6381294Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6382747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6384299Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6386240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6388258Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6390232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6392162Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6393494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6394978Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6396786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6398654Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6399349Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6400392Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6401989Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6403520Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6405042Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6406489Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6407961Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6409353Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6410744Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6412131Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6413575Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6415300Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6416838Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6418442Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6420701Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T10:13:47.6422822Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6423978Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6425864Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6427389Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6428462Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6429690Z [rank0]:E1204 09:29:16.564000 50073 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.6430718Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6431704Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6433175Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6434614Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6436100Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6437438Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6438754Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6440147Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6441532Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6442924Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6444348Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6445744Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6447114Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6448506Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6450509Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 611254272 and is now 634322944. 2025-12-04T10:13:47.6452389Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6453471Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6455514Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6457102Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6458312Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6459726Z [rank1]:E1204 09:29:16.565000 50074 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.6460855Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6461952Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6463602Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6465260Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6466974Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6468313Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6469615Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6471007Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6472393Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6473808Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6475201Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6476574Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6477944Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6479718Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6481978Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T10:13:47.6484098Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6485242Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6487172Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6488794Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6490092Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6491589Z [rank3]:E1204 09:29:16.567000 50076 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.6492597Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6493830Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6495603Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6497253Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6498886Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6500426Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6501933Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6503535Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6505175Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6506902Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6508325Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6509711Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6511098Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6512520Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6514525Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T10:13:47.6516410Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6517450Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6519175Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6520609Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6521687Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6522929Z [rank2]:E1204 09:29:16.572000 50075 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.6523633Z dist init r=1, world=4 2025-12-04T10:13:47.6523887Z dist init r=0, world=4 2025-12-04T10:13:47.6524125Z dist init r=3, world=4 2025-12-04T10:13:47.6524371Z dist init r=2, world=4 2025-12-04T10:13:47.6525587Z [rank0]:[W1204 09:29:16.076685833 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.6526819Z FAILED [9.5420s] [100%] 2025-12-04T10:13:47.6526979Z 2025-12-04T10:13:47.6527111Z =================================== FAILURES =================================== 2025-12-04T10:13:47.6527638Z ____ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda ____ 2025-12-04T10:13:47.6528137Z Traceback (most recent call last): 2025-12-04T10:13:47.6528826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.6529539Z self._join_processes(fn) 2025-12-04T10:13:47.6530252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.6531055Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.6531831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.6532599Z raise RuntimeError(error) 2025-12-04T10:13:47.6533030Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.6533524Z Traceback (most recent call last): 2025-12-04T10:13:47.6534470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6535270Z getattr(self, test_name)() 2025-12-04T10:13:47.6536026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6536783Z fn() 2025-12-04T10:13:47.6537434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6538200Z method(*args, **kwargs) 2025-12-04T10:13:47.6538900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6539664Z method(*args, **kwargs) 2025-12-04T10:13:47.6540374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6541122Z with policy(): 2025-12-04T10:13:47.6541792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6542555Z raise RuntimeError(msg) 2025-12-04T10:13:47.6543958Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T10:13:47.6545291Z 2025-12-04T10:13:47.6545632Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6546685Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6547401Z 2025-12-04T10:13:47.6547637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6548005Z 2025-12-04T10:13:47.6548152Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.6548532Z Traceback (most recent call last): 2025-12-04T10:13:47.6549220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6549931Z getattr(self, test_name)() 2025-12-04T10:13:47.6550602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6551292Z fn() 2025-12-04T10:13:47.6551883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6552560Z method(*args, **kwargs) 2025-12-04T10:13:47.6553199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6553861Z method(*args, **kwargs) 2025-12-04T10:13:47.6554490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6555152Z with policy(): 2025-12-04T10:13:47.6555758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6556424Z raise RuntimeError(msg) 2025-12-04T10:13:47.6557663Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T10:13:47.6558893Z 2025-12-04T10:13:47.6559087Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6559983Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6560716Z 2025-12-04T10:13:47.6560957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6561324Z 2025-12-04T10:13:47.6561328Z 2025-12-04T10:13:47.6561526Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.6562081Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.6563147Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9a7469c5b46925c2.xml - 2025-12-04T10:13:47.6564125Z =========================== short test summary info ============================ 2025-12-04T10:13:47.6565140Z FAILED [9.5420s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.6566106Z Traceback (most recent call last): 2025-12-04T10:13:47.6566811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6567515Z getattr(self, test_name)() 2025-12-04T10:13:47.6568189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6568869Z fn() 2025-12-04T10:13:47.6569445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6570103Z method(*args, **kwargs) 2025-12-04T10:13:47.6570739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6571409Z method(*args, **kwargs) 2025-12-04T10:13:47.6572057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6572727Z with policy(): 2025-12-04T10:13:47.6573401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6574317Z raise RuntimeError(msg) 2025-12-04T10:13:47.6575706Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T10:13:47.6577037Z 2025-12-04T10:13:47.6577294Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6578312Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6579298Z 2025-12-04T10:13:47.6579581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6579986Z 2025-12-04T10:13:47.6580156Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.6580584Z Traceback (most recent call last): 2025-12-04T10:13:47.6581378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6582183Z getattr(self, test_name)() 2025-12-04T10:13:47.6582931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6583712Z fn() 2025-12-04T10:13:47.6584367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6585186Z method(*args, **kwargs) 2025-12-04T10:13:47.6585908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6586713Z method(*args, **kwargs) 2025-12-04T10:13:47.6587421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6588159Z with policy(): 2025-12-04T10:13:47.6588836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6589606Z raise RuntimeError(msg) 2025-12-04T10:13:47.6591200Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T10:13:47.6592373Z 2025-12-04T10:13:47.6592566Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6593467Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6594178Z 2025-12-04T10:13:47.6594413Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6594935Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.6595369Z ======================= 1 failed, 32 deselected in 9.76s ======================= 2025-12-04T10:13:47.6595743Z Got exit code 1 2025-12-04T10:13:47.6596407Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda 2025-12-04T10:13:47.6597390Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.6598427Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-227ae9e59104394c.xml 2025-12-04T10:13:47.6599288Z ============================= test session starts ============================== 2025-12-04T10:13:47.6599878Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.6600399Z cachedir: .pytest_cache 2025-12-04T10:13:47.6601034Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.6601731Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.6602049Z configfile: pytest.ini 2025-12-04T10:13:47.6602679Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.6603473Z collecting ... collected 60 items / 1 deselected / 59 selected 2025-12-04T10:13:47.6603951Z stepcurrent: skipping 1 already run items. 2025-12-04T10:13:47.6604286Z Running 32 items in this shard 2025-12-04T10:13:47.6604482Z 2025-12-04T10:13:47.6605472Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda I1204 09:29:23.599000 50358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 50410 2025-12-04T10:13:47.6607031Z I1204 09:29:23.600000 50358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 50411 2025-12-04T10:13:47.6608039Z I1204 09:29:23.601000 50358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 50412 2025-12-04T10:13:47.6609039Z I1204 09:29:23.602000 50358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 50413 2025-12-04T10:13:47.6610867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6612303Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6614437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6616492Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6618046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6619535Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6621500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6623508Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6625050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6626682Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6628435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6630221Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6631578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6632898Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6634655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6636424Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6637105Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6638114Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6639609Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6641059Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6642541Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6643921Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6645261Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6646677Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6648078Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6649493Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6650906Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6652287Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6653908Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6655495Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6657895Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 732889088. 2025-12-04T10:13:47.6660122Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6661285Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6663335Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6665033Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6666432Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6667683Z [rank0]:E1204 09:29:30.351000 50410 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.6668699Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6669706Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6671189Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6672685Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6674170Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6675527Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6676851Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6678262Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6680092Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6681687Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6683283Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6684818Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6686365Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6688010Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6690367Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.6692544Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6693822Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6695816Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6697505Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6698719Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6700113Z [rank1]:E1204 09:29:30.351000 50411 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.6701230Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6702383Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6704040Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6705706Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6707265Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6708595Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6709914Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6711307Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6712711Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6714321Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6715790Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6717224Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6719530Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6721029Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6723227Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.6725327Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6726616Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6728980Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6731282Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6732465Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6734090Z [rank3]:E1204 09:29:30.352000 50413 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.6735291Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6736403Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6738102Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6739728Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6741351Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6742868Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6744354Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6746022Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6747488Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6749017Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6750412Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6751792Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6753164Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6754552Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6756660Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.6758617Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6759643Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6761395Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6762888Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6763961Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6765221Z [rank2]:E1204 09:29:30.355000 50412 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.6765933Z dist init r=3, world=4 2025-12-04T10:13:47.6766166Z dist init r=1, world=4 2025-12-04T10:13:47.6766396Z dist init r=2, world=4 2025-12-04T10:13:47.6766630Z dist init r=0, world=4 2025-12-04T10:13:47.6767791Z [rank0]:[W1204 09:29:30.889739341 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.6769007Z FAILED [8.3335s] [ 3%] 2025-12-04T10:13:47.6769165Z 2025-12-04T10:13:47.6769293Z =================================== FAILURES =================================== 2025-12-04T10:13:47.6769872Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda _ 2025-12-04T10:13:47.6770410Z Traceback (most recent call last): 2025-12-04T10:13:47.6771092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.6771787Z self._join_processes(fn) 2025-12-04T10:13:47.6772478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.6773294Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.6774299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.6775149Z raise RuntimeError(error) 2025-12-04T10:13:47.6775584Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.6776061Z Traceback (most recent call last): 2025-12-04T10:13:47.6776828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6777605Z getattr(self, test_name)() 2025-12-04T10:13:47.6778386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6779470Z fn() 2025-12-04T10:13:47.6780100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6780839Z method(*args, **kwargs) 2025-12-04T10:13:47.6781528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6782257Z method(*args, **kwargs) 2025-12-04T10:13:47.6782944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6783672Z with policy(): 2025-12-04T10:13:47.6784408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6785157Z raise RuntimeError(msg) 2025-12-04T10:13:47.6786635Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.6788037Z 2025-12-04T10:13:47.6788247Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6789325Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6790211Z 2025-12-04T10:13:47.6790476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6790992Z 2025-12-04T10:13:47.6790998Z 2025-12-04T10:13:47.6791198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.6791741Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.6792832Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-227ae9e59104394c.xml - 2025-12-04T10:13:47.6793917Z =========================== short test summary info ============================ 2025-12-04T10:13:47.6795109Z FAILED [8.3335s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.6796369Z Traceback (most recent call last): 2025-12-04T10:13:47.6797229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6798013Z getattr(self, test_name)() 2025-12-04T10:13:47.6798843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6799638Z fn() 2025-12-04T10:13:47.6800310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6801090Z method(*args, **kwargs) 2025-12-04T10:13:47.6801827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6802593Z method(*args, **kwargs) 2025-12-04T10:13:47.6803374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6804100Z with policy(): 2025-12-04T10:13:47.6804793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6805674Z raise RuntimeError(msg) 2025-12-04T10:13:47.6807108Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.6808440Z 2025-12-04T10:13:47.6808646Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6809778Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6810644Z 2025-12-04T10:13:47.6810916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6811558Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.6812121Z ======================= 1 failed, 1 deselected in 8.55s ======================== 2025-12-04T10:13:47.6812604Z Got exit code 1 2025-12-04T10:13:47.6812961Z Retrying single test... 2025-12-04T10:13:47.6814135Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-38750597d70d3b79.xml 2025-12-04T10:13:47.6815124Z ============================= test session starts ============================== 2025-12-04T10:13:47.6815960Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.6816701Z cachedir: .pytest_cache 2025-12-04T10:13:47.6817473Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.6818377Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.6818869Z configfile: pytest.ini 2025-12-04T10:13:47.6819707Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.6820747Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.6822066Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6823285Z Running 1 items in this shard 2025-12-04T10:13:47.6823523Z 2025-12-04T10:13:47.6824789Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda I1204 09:29:36.900000 50679 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 50731 2025-12-04T10:13:47.6826669Z I1204 09:29:36.901000 50679 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 50732 2025-12-04T10:13:47.6827787Z I1204 09:29:36.901000 50679 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 50733 2025-12-04T10:13:47.6828925Z I1204 09:29:36.902000 50679 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 50734 2025-12-04T10:13:47.6830692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6832123Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6833544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6834970Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6836834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6838777Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6840666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6842559Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6844017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6845501Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6847350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6849261Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6850728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.6852161Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.6854333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.6856490Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.6857426Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6858674Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6860421Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6862243Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6864048Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6865775Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6867404Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6868881Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6870415Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6871983Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6873505Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6875004Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6876493Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6878028Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6880772Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088. 2025-12-04T10:13:47.6883172Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6884468Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6886704Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6888526Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6889869Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6891517Z [rank0]:E1204 09:29:43.616000 50731 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.6892645Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6893959Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6895793Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6897573Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6899305Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6901051Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6902625Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6904357Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6906312Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6907827Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6909348Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6910898Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6912339Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6913880Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6916096Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.6918268Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6919442Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6921307Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6922931Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6924150Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6925499Z [rank2]:E1204 09:29:43.618000 50733 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.6926589Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6927755Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6929360Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6930910Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6932524Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6934335Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6935921Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6937691Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6939400Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6941115Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6942874Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6944487Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6946356Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.6947913Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.6950111Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.6952275Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6953481Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.6955512Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.6958235Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.6960096Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.6962345Z [rank3]:E1204 09:29:43.618000 50734 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.6964200Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.6966092Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.6969302Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.6972392Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.6975784Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.6979163Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.6981913Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6984858Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6988042Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.6991329Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.6993933Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.6996705Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.6999039Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7000579Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7003153Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.7005435Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7006583Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7008699Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7010415Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7011735Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7013118Z [rank1]:E1204 09:29:43.620000 50732 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.7014191Z dist init r=2, world=4 2025-12-04T10:13:47.7014620Z dist init r=0, world=4 2025-12-04T10:13:47.7015042Z dist init r=3, world=4 2025-12-04T10:13:47.7015377Z dist init r=1, world=4 2025-12-04T10:13:47.7016852Z [rank0]:[W1204 09:29:44.127771099 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.7018387Z FAILED [8.2958s] [100%] 2025-12-04T10:13:47.7018601Z 2025-12-04T10:13:47.7018895Z =================================== FAILURES =================================== 2025-12-04T10:13:47.7019711Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda _ 2025-12-04T10:13:47.7020479Z Traceback (most recent call last): 2025-12-04T10:13:47.7021388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.7022346Z self._join_processes(fn) 2025-12-04T10:13:47.7023210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.7024191Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.7025274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.7026324Z raise RuntimeError(error) 2025-12-04T10:13:47.7026793Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.7027380Z Traceback (most recent call last): 2025-12-04T10:13:47.7028186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7028957Z getattr(self, test_name)() 2025-12-04T10:13:47.7029820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7030607Z fn() 2025-12-04T10:13:47.7031277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7032056Z method(*args, **kwargs) 2025-12-04T10:13:47.7032801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7033587Z method(*args, **kwargs) 2025-12-04T10:13:47.7034365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7035096Z with policy(): 2025-12-04T10:13:47.7035821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7036658Z raise RuntimeError(msg) 2025-12-04T10:13:47.7038126Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088. 2025-12-04T10:13:47.7039406Z 2025-12-04T10:13:47.7039616Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7040766Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7041630Z 2025-12-04T10:13:47.7041904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7042300Z 2025-12-04T10:13:47.7042305Z 2025-12-04T10:13:47.7042599Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.7043287Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.7044417Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-38750597d70d3b79.xml - 2025-12-04T10:13:47.7045516Z =========================== short test summary info ============================ 2025-12-04T10:13:47.7046736Z FAILED [8.2958s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.7047869Z Traceback (most recent call last): 2025-12-04T10:13:47.7048690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7049520Z getattr(self, test_name)() 2025-12-04T10:13:47.7050349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7051182Z fn() 2025-12-04T10:13:47.7052001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7052819Z method(*args, **kwargs) 2025-12-04T10:13:47.7111100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7111828Z method(*args, **kwargs) 2025-12-04T10:13:47.7112615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7113304Z with policy(): 2025-12-04T10:13:47.7113935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7114647Z raise RuntimeError(msg) 2025-12-04T10:13:47.7116032Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088. 2025-12-04T10:13:47.7117347Z 2025-12-04T10:13:47.7117544Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7118569Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7119446Z 2025-12-04T10:13:47.7119695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7120236Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.7120680Z ======================= 1 failed, 32 deselected in 8.51s ======================= 2025-12-04T10:13:47.7121096Z Got exit code 1 2025-12-04T10:13:47.7121335Z Retrying single test... 2025-12-04T10:13:47.7122080Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8d9e40030a96f20.xml 2025-12-04T10:13:47.7122932Z ============================= test session starts ============================== 2025-12-04T10:13:47.7123539Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.7124084Z cachedir: .pytest_cache 2025-12-04T10:13:47.7124730Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.7125444Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.7125756Z configfile: pytest.ini 2025-12-04T10:13:47.7126417Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.7127236Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.7128332Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7129323Z Running 1 items in this shard 2025-12-04T10:13:47.7129513Z 2025-12-04T10:13:47.7130563Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda I1204 09:29:50.270000 51000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 51052 2025-12-04T10:13:47.7132184Z I1204 09:29:50.271000 51000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 51053 2025-12-04T10:13:47.7133380Z I1204 09:29:50.271000 51000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 51054 2025-12-04T10:13:47.7134652Z I1204 09:29:50.272000 51000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 51055 2025-12-04T10:13:47.7136523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7138007Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7139980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7141970Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7143495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7144976Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7146895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7148691Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7150033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7151378Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7153333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7155206Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7156629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7158015Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7159833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7161695Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7162391Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7163446Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7165082Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7166529Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7167968Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7169299Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7170642Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7172032Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7173489Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7175215Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7176779Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7178342Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7180087Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7181731Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7184075Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 707723264 and is now 732889088. 2025-12-04T10:13:47.7186273Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7187423Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7189407Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7191247Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7192321Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7193554Z [rank0]:E1204 09:29:56.991000 51052 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.7194592Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7195581Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7197044Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7198479Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7199954Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7201294Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7202605Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7203998Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7205385Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7206775Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7208220Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7209605Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7210972Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7212371Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7214797Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.7217018Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7218167Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7220152Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7221830Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7223037Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7224458Z [rank3]:E1204 09:29:56.993000 51055 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.7225584Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7226701Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7228166Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7229635Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7231080Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7232421Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7233736Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7235138Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7236534Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7237955Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7239389Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7240740Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7242109Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7243515Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7245599Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.7247556Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7248572Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7250334Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7251856Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7252931Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7254729Z [rank1]:E1204 09:29:56.993000 51053 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.7255852Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7256975Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7258674Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7261098Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7263845Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7266514Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7268774Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7272081Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7274939Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7277791Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7281226Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7284207Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7286971Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7289776Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7294352Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.7298519Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7300569Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7303559Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7305286Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7306645Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7308002Z [rank2]:E1204 09:29:56.997000 51054 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.7308758Z dist init r=3, world=4 2025-12-04T10:13:47.7309013Z dist init r=2, world=4 2025-12-04T10:13:47.7309277Z dist init r=1, world=4 2025-12-04T10:13:47.7309574Z dist init r=0, world=4 2025-12-04T10:13:47.7310858Z [rank0]:[W1204 09:29:57.530086763 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.7312182Z FAILED [8.8057s] [100%] 2025-12-04T10:13:47.7312357Z 2025-12-04T10:13:47.7312498Z =================================== FAILURES =================================== 2025-12-04T10:13:47.7313128Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda _ 2025-12-04T10:13:47.7313729Z Traceback (most recent call last): 2025-12-04T10:13:47.7314472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.7315228Z self._join_processes(fn) 2025-12-04T10:13:47.7315989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.7316856Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.7317705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.7318578Z raise RuntimeError(error) 2025-12-04T10:13:47.7319099Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.7319541Z Traceback (most recent call last): 2025-12-04T10:13:47.7320270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7321008Z getattr(self, test_name)() 2025-12-04T10:13:47.7321695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7322406Z fn() 2025-12-04T10:13:47.7323009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7323704Z method(*args, **kwargs) 2025-12-04T10:13:47.7324355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7325050Z method(*args, **kwargs) 2025-12-04T10:13:47.7325703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7326387Z with policy(): 2025-12-04T10:13:47.7327013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7327726Z raise RuntimeError(msg) 2025-12-04T10:13:47.7329111Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.7330426Z 2025-12-04T10:13:47.7330634Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7331693Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7332520Z 2025-12-04T10:13:47.7332772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7333154Z 2025-12-04T10:13:47.7333411Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.7333980Z Traceback (most recent call last): 2025-12-04T10:13:47.7334751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7335550Z getattr(self, test_name)() 2025-12-04T10:13:47.7336353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7337112Z fn() 2025-12-04T10:13:47.7337740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7338491Z method(*args, **kwargs) 2025-12-04T10:13:47.7339194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7339939Z method(*args, **kwargs) 2025-12-04T10:13:47.7340627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7341367Z with policy(): 2025-12-04T10:13:47.7342041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7342786Z raise RuntimeError(msg) 2025-12-04T10:13:47.7344302Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.7345837Z 2025-12-04T10:13:47.7346030Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7346998Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7347778Z 2025-12-04T10:13:47.7348023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7348377Z 2025-12-04T10:13:47.7348381Z 2025-12-04T10:13:47.7348582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.7349135Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.7350198Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8d9e40030a96f20.xml - 2025-12-04T10:13:47.7351186Z =========================== short test summary info ============================ 2025-12-04T10:13:47.7352264Z FAILED [8.8057s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.7353296Z Traceback (most recent call last): 2025-12-04T10:13:47.7353988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7354695Z getattr(self, test_name)() 2025-12-04T10:13:47.7355348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7356028Z fn() 2025-12-04T10:13:47.7356596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7357252Z method(*args, **kwargs) 2025-12-04T10:13:47.7357907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7358574Z method(*args, **kwargs) 2025-12-04T10:13:47.7359194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7359846Z with policy(): 2025-12-04T10:13:47.7360445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7361120Z raise RuntimeError(msg) 2025-12-04T10:13:47.7362454Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.7363696Z 2025-12-04T10:13:47.7363888Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7364849Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7365639Z 2025-12-04T10:13:47.7365872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7366226Z 2025-12-04T10:13:47.7366377Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.7366730Z Traceback (most recent call last): 2025-12-04T10:13:47.7367419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7368120Z getattr(self, test_name)() 2025-12-04T10:13:47.7368806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7369487Z fn() 2025-12-04T10:13:47.7370057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7370757Z method(*args, **kwargs) 2025-12-04T10:13:47.7371374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7372037Z method(*args, **kwargs) 2025-12-04T10:13:47.7372656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7373399Z with policy(): 2025-12-04T10:13:47.7374205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7374967Z raise RuntimeError(msg) 2025-12-04T10:13:47.7376461Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.7377867Z 2025-12-04T10:13:47.7378092Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7379418Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7380308Z 2025-12-04T10:13:47.7380575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7381157Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.7381649Z ======================= 1 failed, 32 deselected in 9.02s ======================= 2025-12-04T10:13:47.7382050Z Got exit code 1 2025-12-04T10:13:47.7382884Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda 2025-12-04T10:13:47.7384165Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.7385328Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-257466dde9fb107b.xml 2025-12-04T10:13:47.7386247Z ============================= test session starts ============================== 2025-12-04T10:13:47.7386905Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.7387502Z cachedir: .pytest_cache 2025-12-04T10:13:47.7388193Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.7388955Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.7389342Z configfile: pytest.ini 2025-12-04T10:13:47.7393888Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.7394670Z collecting ... collected 60 items / 2 deselected / 58 selected 2025-12-04T10:13:47.7395104Z stepcurrent: skipping 2 already run items. 2025-12-04T10:13:47.7395440Z Running 31 items in this shard 2025-12-04T10:13:47.7395622Z 2025-12-04T10:13:47.7396615Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 09:30:03.649000 51321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 51373 2025-12-04T10:13:47.7398154Z I1204 09:30:03.650000 51321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 51374 2025-12-04T10:13:47.7399149Z I1204 09:30:03.651000 51321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 51375 2025-12-04T10:13:47.7400209Z I1204 09:30:03.652000 51321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 51376 2025-12-04T10:13:47.7401871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7403232Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7404329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7405487Z {} 2025-12-04T10:13:47.7406043Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7406640Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7408465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7410230Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7411589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7412909Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7414326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7415673Z {} 2025-12-04T10:13:47.7416297Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7416972Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7419020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7421019Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7422586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7424079Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7425734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7427733Z {} 2025-12-04T10:13:47.7428336Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7428972Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7430903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7432881Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7434326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7435729Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7436965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7438109Z {} 2025-12-04T10:13:47.7438673Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7439277Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7441100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7442862Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7443526Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7444535Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7446080Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7447537Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7448987Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7450341Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7451702Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7453103Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7454900Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7456470Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7458045Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7459574Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7461160Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7462780Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7465112Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184. 2025-12-04T10:13:47.7467380Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7468415Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7470176Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7471667Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7472737Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7473982Z [rank1]:E1204 09:30:10.405000 51374 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.7474986Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7476006Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7477478Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7479274Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7480913Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7482525Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7484029Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7485598Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7487183Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7488764Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7490350Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7491983Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7493440Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7495182Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7497545Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T10:13:47.7499762Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7500922Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7502897Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7504586Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7505897Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7507210Z [rank0]:E1204 09:30:10.406000 51373 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.7508316Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7509364Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7510932Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7512472Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7514116Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7515454Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7516781Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7518174Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7519581Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7521012Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7522397Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7523785Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7525150Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7526561Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7528663Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.7530631Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7531645Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7533467Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7535306Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7536554Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7537936Z [rank3]:E1204 09:30:10.406000 51376 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.7539071Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7540180Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7541876Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7543510Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7545146Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7546694Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7548009Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7549411Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7550840Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7552243Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7553747Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7555109Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7556486Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7557892Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7559978Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.7561940Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7562967Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7564730Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7566254Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7567328Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7568569Z [rank2]:E1204 09:30:10.406000 51375 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.7569268Z dist init r=2, world=4 2025-12-04T10:13:47.7569511Z dist init r=3, world=4 2025-12-04T10:13:47.7569751Z dist init r=1, world=4 2025-12-04T10:13:47.7569993Z dist init r=0, world=4 2025-12-04T10:13:47.7571199Z [rank0]:[W1204 09:30:10.916879315 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.7572413Z FAILED [9.0876s] [ 3%] 2025-12-04T10:13:47.7572581Z 2025-12-04T10:13:47.7572716Z =================================== FAILURES =================================== 2025-12-04T10:13:47.7573351Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _ 2025-12-04T10:13:47.7574110Z Traceback (most recent call last): 2025-12-04T10:13:47.7574883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.7575676Z self._join_processes(fn) 2025-12-04T10:13:47.7576463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.7577352Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.7578230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.7579286Z raise RuntimeError(error) 2025-12-04T10:13:47.7579735Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.7580849Z Traceback (most recent call last): 2025-12-04T10:13:47.7581627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7582412Z getattr(self, test_name)() 2025-12-04T10:13:47.7583148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7583909Z fn() 2025-12-04T10:13:47.7584541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7585284Z method(*args, **kwargs) 2025-12-04T10:13:47.7585976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7586718Z method(*args, **kwargs) 2025-12-04T10:13:47.7587419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7588149Z with policy(): 2025-12-04T10:13:47.7588818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7589571Z raise RuntimeError(msg) 2025-12-04T10:13:47.7591108Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184. 2025-12-04T10:13:47.7592345Z 2025-12-04T10:13:47.7592543Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7593547Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7594334Z 2025-12-04T10:13:47.7594568Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7594925Z 2025-12-04T10:13:47.7595079Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.7595448Z Traceback (most recent call last): 2025-12-04T10:13:47.7596130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7596829Z getattr(self, test_name)() 2025-12-04T10:13:47.7597488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7598152Z fn() 2025-12-04T10:13:47.7598745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7599410Z method(*args, **kwargs) 2025-12-04T10:13:47.7600036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7600688Z method(*args, **kwargs) 2025-12-04T10:13:47.7601304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7601952Z with policy(): 2025-12-04T10:13:47.7602541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7603207Z raise RuntimeError(msg) 2025-12-04T10:13:47.7604506Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.7605774Z 2025-12-04T10:13:47.7605972Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7606939Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7607735Z 2025-12-04T10:13:47.7607969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7608335Z 2025-12-04T10:13:47.7608478Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.7608840Z Traceback (most recent call last): 2025-12-04T10:13:47.7609521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7610224Z getattr(self, test_name)() 2025-12-04T10:13:47.7610889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7611567Z fn() 2025-12-04T10:13:47.7612383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7613643Z method(*args, **kwargs) 2025-12-04T10:13:47.7615028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7616316Z method(*args, **kwargs) 2025-12-04T10:13:47.7617543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7618832Z with policy(): 2025-12-04T10:13:47.7619947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7621261Z raise RuntimeError(msg) 2025-12-04T10:13:47.7624007Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.7626719Z 2025-12-04T10:13:47.7627096Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7629005Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7630612Z 2025-12-04T10:13:47.7631105Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7631848Z 2025-12-04T10:13:47.7631855Z 2025-12-04T10:13:47.7632252Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.7633403Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.7635652Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-257466dde9fb107b.xml - 2025-12-04T10:13:47.7637588Z =========================== short test summary info ============================ 2025-12-04T10:13:47.7639477Z FAILED [9.0876s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.7641367Z Traceback (most recent call last): 2025-12-04T10:13:47.7642671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7643979Z getattr(self, test_name)() 2025-12-04T10:13:47.7645177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7646559Z fn() 2025-12-04T10:13:47.7647630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7648764Z method(*args, **kwargs) 2025-12-04T10:13:47.7649932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7650932Z method(*args, **kwargs) 2025-12-04T10:13:47.7651566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7652212Z with policy(): 2025-12-04T10:13:47.7652817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7653595Z raise RuntimeError(msg) 2025-12-04T10:13:47.7655221Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184. 2025-12-04T10:13:47.7656633Z 2025-12-04T10:13:47.7656849Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7657942Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7658816Z 2025-12-04T10:13:47.7659082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7659482Z 2025-12-04T10:13:47.7659655Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.7660055Z Traceback (most recent call last): 2025-12-04T10:13:47.7660831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7661620Z getattr(self, test_name)() 2025-12-04T10:13:47.7662358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7663123Z fn() 2025-12-04T10:13:47.7663830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7664587Z method(*args, **kwargs) 2025-12-04T10:13:47.7665278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7666073Z method(*args, **kwargs) 2025-12-04T10:13:47.7666691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7667332Z with policy(): 2025-12-04T10:13:47.7667928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7668598Z raise RuntimeError(msg) 2025-12-04T10:13:47.7669943Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.7671174Z 2025-12-04T10:13:47.7671370Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7672326Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7673109Z 2025-12-04T10:13:47.7673341Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7673695Z 2025-12-04T10:13:47.7673848Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.7674203Z Traceback (most recent call last): 2025-12-04T10:13:47.7674879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7675625Z getattr(self, test_name)() 2025-12-04T10:13:47.7676288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7677158Z fn() 2025-12-04T10:13:47.7677754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7678453Z method(*args, **kwargs) 2025-12-04T10:13:47.7679593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7680327Z method(*args, **kwargs) 2025-12-04T10:13:47.7681018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7681751Z with policy(): 2025-12-04T10:13:47.7682419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7683173Z raise RuntimeError(msg) 2025-12-04T10:13:47.7684649Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.7686053Z 2025-12-04T10:13:47.7686272Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7687351Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7688224Z 2025-12-04T10:13:47.7688486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7689071Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.7689558Z ======================= 1 failed, 2 deselected in 9.30s ======================== 2025-12-04T10:13:47.7689956Z Got exit code 1 2025-12-04T10:13:47.7690211Z Retrying single test... 2025-12-04T10:13:47.7691089Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-94f5dd2e01869af2.xml 2025-12-04T10:13:47.7692081Z ============================= test session starts ============================== 2025-12-04T10:13:47.7692682Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.7693299Z cachedir: .pytest_cache 2025-12-04T10:13:47.7694149Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.7694912Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.7695253Z configfile: pytest.ini 2025-12-04T10:13:47.7696018Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.7696898Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.7698065Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7699122Z Running 1 items in this shard 2025-12-04T10:13:47.7699339Z 2025-12-04T10:13:47.7700449Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 09:30:16.970000 51642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 51694 2025-12-04T10:13:47.7702185Z I1204 09:30:16.970000 51642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 51695 2025-12-04T10:13:47.7703307Z I1204 09:30:16.971000 51642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 51696 2025-12-04T10:13:47.7704467Z I1204 09:30:16.972000 51642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 51697 2025-12-04T10:13:47.7706351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7707717Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7708819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7709960Z {} 2025-12-04T10:13:47.7710513Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7711111Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7712932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7714701Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7716243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7717637Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7719050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7720445Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7721594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7722810Z {} 2025-12-04T10:13:47.7723394Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7724023Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7725298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7726515Z {} 2025-12-04T10:13:47.7727273Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7727916Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7729902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7731843Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7734030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7736087Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7737605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7739083Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7740314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7741616Z {} 2025-12-04T10:13:47.7742276Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7743416Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7746056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7748008Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7748741Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7749833Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7751520Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7753109Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7754703Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7756271Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7757696Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7759182Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7760734Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7762129Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7763527Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7764907Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7766277Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7767707Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7769792Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 718209024 and is now 732889088. 2025-12-04T10:13:47.7771757Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7772770Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7774916Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7776597Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7777802Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7779392Z [rank0]:E1204 09:30:23.825000 51694 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.7780522Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7781711Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7783375Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7785007Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7786639Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7788207Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7789703Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7791321Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7792711Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7794095Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7795534Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7796886Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7798290Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7799695Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7801776Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.7803744Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7804781Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7806551Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7808047Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7809115Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7810369Z [rank1]:E1204 09:30:23.825000 51695 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.7811374Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7812365Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7814110Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7815744Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7817428Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7818948Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7820438Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7822011Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7823594Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7825202Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7826877Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7828262Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7829629Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7831033Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7833121Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 481230848 and is now 623837184. 2025-12-04T10:13:47.7835095Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7836122Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7837895Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7839369Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7840467Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7841704Z [rank2]:E1204 09:30:23.826000 51696 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.7842698Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7843676Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7845173Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7846632Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7848078Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7849428Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7850743Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7852144Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7853806Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7855414Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7856989Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7858506Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7860043Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7861625Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7864259Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.7866628Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7867786Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7869781Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7871356Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7872594Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7873969Z [rank3]:E1204 09:30:23.829000 51697 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.7874777Z dist init r=1, world=4 2025-12-04T10:13:47.7875109Z dist init r=3, world=4 2025-12-04T10:13:47.7875474Z dist init r=0, world=4 2025-12-04T10:13:47.7875826Z dist init r=2, world=4 2025-12-04T10:13:47.7877182Z [rank0]:[W1204 09:30:24.336096680 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.7878469Z FAILED [8.4193s] [100%] 2025-12-04T10:13:47.7878861Z 2025-12-04T10:13:47.7879209Z =================================== FAILURES =================================== 2025-12-04T10:13:47.7880073Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _ 2025-12-04T10:13:47.7880816Z Traceback (most recent call last): 2025-12-04T10:13:47.7881679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.7882619Z self._join_processes(fn) 2025-12-04T10:13:47.7883531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.7884570Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.7885583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.7885786Z raise RuntimeError(error) 2025-12-04T10:13:47.7886114Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.7886275Z Traceback (most recent call last): 2025-12-04T10:13:47.7886877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7887063Z getattr(self, test_name)() 2025-12-04T10:13:47.7887704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7887845Z fn() 2025-12-04T10:13:47.7888399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7888600Z method(*args, **kwargs) 2025-12-04T10:13:47.7889122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7889370Z method(*args, **kwargs) 2025-12-04T10:13:47.7889996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7890134Z with policy(): 2025-12-04T10:13:47.7890732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7890881Z raise RuntimeError(msg) 2025-12-04T10:13:47.7892186Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.7892269Z 2025-12-04T10:13:47.7892516Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7893344Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7893352Z 2025-12-04T10:13:47.7893849Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7893856Z 2025-12-04T10:13:47.7893860Z 2025-12-04T10:13:47.7894122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.7894521Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.7895404Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-94f5dd2e01869af2.xml - 2025-12-04T10:13:47.7895683Z =========================== short test summary info ============================ 2025-12-04T10:13:47.7896687Z FAILED [8.4193s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.7896852Z Traceback (most recent call last): 2025-12-04T10:13:47.7897493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7897620Z getattr(self, test_name)() 2025-12-04T10:13:47.7898239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7898438Z fn() 2025-12-04T10:13:47.7898982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7899177Z method(*args, **kwargs) 2025-12-04T10:13:47.7899787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7899949Z method(*args, **kwargs) 2025-12-04T10:13:47.7900592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7900759Z with policy(): 2025-12-04T10:13:47.7901306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7901505Z raise RuntimeError(msg) 2025-12-04T10:13:47.7902828Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.7902835Z 2025-12-04T10:13:47.7903150Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7903956Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7903962Z 2025-12-04T10:13:47.7904316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7904536Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.7904758Z ======================= 1 failed, 32 deselected in 8.63s ======================= 2025-12-04T10:13:47.7904920Z Got exit code 1 2025-12-04T10:13:47.7905093Z Retrying single test... 2025-12-04T10:13:47.7905927Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-42c522b0340c97ac.xml 2025-12-04T10:13:47.7906224Z ============================= test session starts ============================== 2025-12-04T10:13:47.7906571Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.7906759Z cachedir: .pytest_cache 2025-12-04T10:13:47.7907231Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.7907445Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.7907641Z configfile: pytest.ini 2025-12-04T10:13:47.7908149Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.7908486Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.7909255Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7909373Z Running 1 items in this shard 2025-12-04T10:13:47.7909378Z 2025-12-04T10:13:47.7910519Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 09:30:30.430000 51963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 52015 2025-12-04T10:13:47.7911005Z I1204 09:30:30.431000 51963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 52016 2025-12-04T10:13:47.7911531Z I1204 09:30:30.431000 51963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 52017 2025-12-04T10:13:47.7911998Z I1204 09:30:30.432000 51963 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 52018 2025-12-04T10:13:47.7913158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7913341Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7914314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7914561Z {} 2025-12-04T10:13:47.7914909Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7915178Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7916706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7916999Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7918129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7918280Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7919445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7919591Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7920774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.7920949Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.7921945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7922188Z {} 2025-12-04T10:13:47.7922511Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7922762Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7924585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7924962Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7926611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7926939Z {} 2025-12-04T10:13:47.7927551Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7927880Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7929714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:47.7930017Z {} 2025-12-04T10:13:47.7930639Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:47.7931088Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:47.7934178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7934823Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7938088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.7938515Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.7939416Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7940488Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7942487Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7943723Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7945877Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7946749Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7948607Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7949450Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7950999Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7952125Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7953829Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7954697Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7956329Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7957247Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7960264Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088. 2025-12-04T10:13:47.7961126Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7962042Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7963215Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7963654Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7964348Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7965045Z [rank0]:E1204 09:30:37.190000 52015 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.7965513Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7966097Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7967142Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7967661Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7968760Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7969196Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7970181Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7970683Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7971709Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7972188Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7973161Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7974005Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7975017Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7975652Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7977415Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.7977925Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7978854Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7980151Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7980571Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7981329Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7981941Z [rank1]:E1204 09:30:37.191000 52016 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.7982479Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7983177Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.7984308Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.7984909Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.7985936Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.7986349Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.7987504Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7988047Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7989096Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.7989621Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.7990753Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.7991211Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.7992164Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.7992679Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.7994273Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.7994681Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7995302Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.7996516Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.7996876Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.7997592Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.7998113Z [rank3]:E1204 09:30:37.191000 52018 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.7998552Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.7999194Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8000158Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8000691Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8001597Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8002010Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8002942Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8003446Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8004390Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8004851Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8005781Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8006244Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8007186Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8007699Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8009262Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.8009675Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8010296Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8011419Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.8011804Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8012539Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8013063Z [rank2]:E1204 09:30:37.194000 52017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8013273Z dist init r=1, world=4 2025-12-04T10:13:47.8013550Z dist init r=3, world=4 2025-12-04T10:13:47.8013827Z dist init r=2, world=4 2025-12-04T10:13:47.8014071Z dist init r=0, world=4 2025-12-04T10:13:47.8015278Z [rank0]:[W1204 09:30:37.704186404 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8015428Z FAILED [8.4173s] [100%] 2025-12-04T10:13:47.8015436Z 2025-12-04T10:13:47.8015671Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8016074Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _ 2025-12-04T10:13:47.8016320Z Traceback (most recent call last): 2025-12-04T10:13:47.8016929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8017096Z self._join_processes(fn) 2025-12-04T10:13:47.8017776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8017960Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8018607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8018822Z raise RuntimeError(error) 2025-12-04T10:13:47.8019131Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.8019340Z Traceback (most recent call last): 2025-12-04T10:13:47.8019918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8020102Z getattr(self, test_name)() 2025-12-04T10:13:47.8020703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8020865Z fn() 2025-12-04T10:13:47.8021523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8021732Z method(*args, **kwargs) 2025-12-04T10:13:47.8022276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8022470Z method(*args, **kwargs) 2025-12-04T10:13:47.8022988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8023167Z with policy(): 2025-12-04T10:13:47.8023785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8023935Z raise RuntimeError(msg) 2025-12-04T10:13:47.8025296Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.8025306Z 2025-12-04T10:13:47.8025663Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8026422Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.8026428Z 2025-12-04T10:13:47.8026718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8026723Z 2025-12-04T10:13:47.8026728Z 2025-12-04T10:13:47.8026962Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8027285Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8028058Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-42c522b0340c97ac.xml - 2025-12-04T10:13:47.8028283Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8029160Z FAILED [8.4173s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.8029319Z Traceback (most recent call last): 2025-12-04T10:13:47.8029889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8030023Z getattr(self, test_name)() 2025-12-04T10:13:47.8030617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8030716Z fn() 2025-12-04T10:13:47.8031237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8031481Z method(*args, **kwargs) 2025-12-04T10:13:47.8031964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8032146Z method(*args, **kwargs) 2025-12-04T10:13:47.8032624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8032726Z with policy(): 2025-12-04T10:13:47.8033309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8033441Z raise RuntimeError(msg) 2025-12-04T10:13:47.8034698Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.8034730Z 2025-12-04T10:13:47.8034958Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8035657Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.8035662Z 2025-12-04T10:13:47.8035991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8036204Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8036450Z ======================= 1 failed, 32 deselected in 8.64s ======================= 2025-12-04T10:13:47.8036573Z Got exit code 1 2025-12-04T10:13:47.8037206Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T10:13:47.8037630Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.8038249Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-32205b0cc860e51d.xml 2025-12-04T10:13:47.8038500Z ============================= test session starts ============================== 2025-12-04T10:13:47.8038844Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8038976Z cachedir: .pytest_cache 2025-12-04T10:13:47.8039514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8039638Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8039943Z configfile: pytest.ini 2025-12-04T10:13:47.8040483Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8040859Z collecting ... collected 60 items / 3 deselected / 57 selected 2025-12-04T10:13:47.8041183Z stepcurrent: skipping 3 already run items. 2025-12-04T10:13:47.8041414Z Running 30 items in this shard 2025-12-04T10:13:47.8041424Z 2025-12-04T10:13:47.8043130Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 09:30:43.819000 52284 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 52336 2025-12-04T10:13:47.8043828Z I1204 09:30:43.820000 52284 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 52337 2025-12-04T10:13:47.8044334Z I1204 09:30:43.821000 52284 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 52338 2025-12-04T10:13:47.8044941Z I1204 09:30:43.822000 52284 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 52339 2025-12-04T10:13:47.8046150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8046338Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8048045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8048309Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8049540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8049730Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8051438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8051611Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8052907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8053067Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8055106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8055395Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8056665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8056900Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8058681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8058935Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8059443Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8060062Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8061108Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8061698Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8062796Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8063247Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8064289Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8064848Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8065971Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8066487Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8067426Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8067857Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8068743Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8069240Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8070824Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088. 2025-12-04T10:13:47.8071255Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8071929Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8073103Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8073466Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8074120Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8074744Z [rank0]:E1204 09:30:50.577000 52336 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8075178Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8075754Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8076679Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8077189Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8078143Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8078547Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8079874Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8080475Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8081569Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8082067Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8083179Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8083671Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8084668Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8085259Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8087023Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.8087501Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8088311Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8089595Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8090010Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8090813Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8091634Z [rank1]:E1204 09:30:50.577000 52337 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8092110Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8092678Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8093850Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8094449Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8095521Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8096062Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8097080Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8097646Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8098691Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8099213Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8100233Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8107968Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8108906Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8109345Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8110883Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.8111298Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8111883Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8112937Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8113263Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8113927Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8114414Z [rank3]:E1204 09:30:50.579000 52339 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8114812Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8115281Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8116167Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8116617Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8117531Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8117926Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8118771Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8119207Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8120049Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8120486Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8121330Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8121734Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8122581Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8123009Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8124576Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.8124898Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8125672Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8126825Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8127174Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8127845Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8128363Z [rank2]:E1204 09:30:50.583000 52338 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8128458Z dist init r=0, world=4 2025-12-04T10:13:47.8128548Z dist init r=3, world=4 2025-12-04T10:13:47.8128642Z dist init r=1, world=4 2025-12-04T10:13:47.8128729Z dist init r=2, world=4 2025-12-04T10:13:47.8129813Z [rank0]:[W1204 09:30:50.090913095 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8129936Z FAILED [8.5227s] [ 3%] 2025-12-04T10:13:47.8129944Z 2025-12-04T10:13:47.8130078Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8130423Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _ 2025-12-04T10:13:47.8130560Z Traceback (most recent call last): 2025-12-04T10:13:47.8131072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8131179Z self._join_processes(fn) 2025-12-04T10:13:47.8131725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8131854Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8132424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8132530Z raise RuntimeError(error) 2025-12-04T10:13:47.8132750Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8132859Z Traceback (most recent call last): 2025-12-04T10:13:47.8133468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8133582Z getattr(self, test_name)() 2025-12-04T10:13:47.8134271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8134367Z fn() 2025-12-04T10:13:47.8134866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8134967Z method(*args, **kwargs) 2025-12-04T10:13:47.8135471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8135578Z method(*args, **kwargs) 2025-12-04T10:13:47.8136076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8136210Z with policy(): 2025-12-04T10:13:47.8136721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8136830Z raise RuntimeError(msg) 2025-12-04T10:13:47.8138093Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088. 2025-12-04T10:13:47.8138100Z 2025-12-04T10:13:47.8138314Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8139089Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8139099Z 2025-12-04T10:13:47.8139364Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8139371Z 2025-12-04T10:13:47.8139378Z 2025-12-04T10:13:47.8139601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8139858Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8140658Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-32205b0cc860e51d.xml - 2025-12-04T10:13:47.8140824Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8141729Z FAILED [8.5227s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8142383Z Traceback (most recent call last): 2025-12-04T10:13:47.8142929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8143040Z getattr(self, test_name)() 2025-12-04T10:13:47.8143618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8143702Z fn() 2025-12-04T10:13:47.8144208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8144310Z method(*args, **kwargs) 2025-12-04T10:13:47.8144805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8144910Z method(*args, **kwargs) 2025-12-04T10:13:47.8145408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8145622Z with policy(): 2025-12-04T10:13:47.8146205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8146300Z raise RuntimeError(msg) 2025-12-04T10:13:47.8147430Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088. 2025-12-04T10:13:47.8147436Z 2025-12-04T10:13:47.8147624Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8148286Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8148293Z 2025-12-04T10:13:47.8148526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8148683Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8148933Z ======================= 1 failed, 3 deselected in 8.74s ======================== 2025-12-04T10:13:47.8149020Z Got exit code 1 2025-12-04T10:13:47.8149111Z Retrying single test... 2025-12-04T10:13:47.8149677Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e08ad6962badbec0.xml 2025-12-04T10:13:47.8149819Z ============================= test session starts ============================== 2025-12-04T10:13:47.8150129Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8150222Z cachedir: .pytest_cache 2025-12-04T10:13:47.8150674Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8150817Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8150913Z configfile: pytest.ini 2025-12-04T10:13:47.8151390Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8151580Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.8152309Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8152414Z Running 1 items in this shard 2025-12-04T10:13:47.8152419Z 2025-12-04T10:13:47.8153395Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 09:30:57.180000 52605 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 52657 2025-12-04T10:13:47.8153872Z I1204 09:30:57.181000 52605 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 52658 2025-12-04T10:13:47.8154307Z I1204 09:30:57.181000 52605 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 52659 2025-12-04T10:13:47.8154738Z I1204 09:30:57.182000 52605 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 52660 2025-12-04T10:13:47.8155873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8155986Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8157079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8157192Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8158707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8158855Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8160368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8160521Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8161631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8161746Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8163239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8163389Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8164500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8164611Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8166115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8166262Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8166672Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8167187Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8168074Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8168543Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8169414Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8169771Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8170625Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8171064Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8171907Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8172342Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8173183Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8173821Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8174821Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8175310Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8177037Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 718209024 and is now 732889088. 2025-12-04T10:13:47.8177440Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8178102Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8179496Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8179869Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8180575Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8181121Z [rank0]:E1204 09:31:03.917000 52657 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8181645Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8182174Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8183218Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8183721Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8184708Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8185111Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8186067Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8186559Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8187515Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8187998Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8188993Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8189432Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8190397Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8190965Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8192533Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.8192858Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8193449Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8194504Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8194831Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8195492Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8195973Z [rank2]:E1204 09:31:03.917000 52659 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8196401Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8196868Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8197747Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8198193Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8199066Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8199425Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8200263Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8200700Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8201546Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8202005Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8202848Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8203240Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8204093Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8204548Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8206086Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.8206407Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8206995Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8208053Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8208403Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8209030Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8209532Z [rank3]:E1204 09:31:03.917000 52660 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8209935Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8210399Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8211290Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8211739Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8212610Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8212963Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8214091Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8214587Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8215572Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8216064Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8217021Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8217463Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8218468Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8218959Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8220678Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T10:13:47.8221037Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8221689Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8222915Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8223310Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8224019Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8224566Z [rank1]:E1204 09:31:03.918000 52658 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8224671Z dist init r=0, world=4 2025-12-04T10:13:47.8224765Z dist init r=2, world=4 2025-12-04T10:13:47.8224860Z dist init r=1, world=4 2025-12-04T10:13:47.8224963Z dist init r=3, world=4 2025-12-04T10:13:47.8226206Z [rank0]:[W1204 09:31:04.431721373 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8226303Z FAILED [8.4324s] [100%] 2025-12-04T10:13:47.8226308Z 2025-12-04T10:13:47.8226435Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8226752Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _ 2025-12-04T10:13:47.8226860Z Traceback (most recent call last): 2025-12-04T10:13:47.8227341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8227446Z self._join_processes(fn) 2025-12-04T10:13:47.8227962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8228087Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8228653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8228752Z raise RuntimeError(error) 2025-12-04T10:13:47.8228958Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8229065Z Traceback (most recent call last): 2025-12-04T10:13:47.8229540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8229646Z getattr(self, test_name)() 2025-12-04T10:13:47.8230108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8230183Z fn() 2025-12-04T10:13:47.8230665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8230758Z method(*args, **kwargs) 2025-12-04T10:13:47.8231199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8231293Z method(*args, **kwargs) 2025-12-04T10:13:47.8231732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8231822Z with policy(): 2025-12-04T10:13:47.8232266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8232357Z raise RuntimeError(msg) 2025-12-04T10:13:47.8233486Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 718209024 and is now 732889088. 2025-12-04T10:13:47.8233519Z 2025-12-04T10:13:47.8233707Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8234367Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8234397Z 2025-12-04T10:13:47.8234627Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8234632Z 2025-12-04T10:13:47.8234636Z 2025-12-04T10:13:47.8234833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8235063Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8235766Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e08ad6962badbec0.xml - 2025-12-04T10:13:47.8235921Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8236909Z FAILED [8.4324s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8237065Z Traceback (most recent call last): 2025-12-04T10:13:47.8237991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8238165Z getattr(self, test_name)() 2025-12-04T10:13:47.8239044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8239169Z fn() 2025-12-04T10:13:47.8239963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8240124Z method(*args, **kwargs) 2025-12-04T10:13:47.8240900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8241065Z method(*args, **kwargs) 2025-12-04T10:13:47.8241989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8242137Z with policy(): 2025-12-04T10:13:47.8242909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8243076Z raise RuntimeError(msg) 2025-12-04T10:13:47.8245063Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 718209024 and is now 732889088. 2025-12-04T10:13:47.8245078Z 2025-12-04T10:13:47.8245503Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8246641Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8246816Z 2025-12-04T10:13:47.8247271Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8247580Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8247882Z ======================= 1 failed, 32 deselected in 8.65s ======================= 2025-12-04T10:13:47.8248197Z Got exit code 1 2025-12-04T10:13:47.8248369Z Retrying single test... 2025-12-04T10:13:47.8249547Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2ea67fcde569130f.xml 2025-12-04T10:13:47.8249820Z ============================= test session starts ============================== 2025-12-04T10:13:47.8250540Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8250729Z cachedir: .pytest_cache 2025-12-04T10:13:47.8251619Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8251893Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8252077Z configfile: pytest.ini 2025-12-04T10:13:47.8253001Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8253475Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.8255196Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8255417Z Running 1 items in this shard 2025-12-04T10:13:47.8255428Z 2025-12-04T10:13:47.8257588Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 09:31:10.679000 52926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 52978 2025-12-04T10:13:47.8258567Z I1204 09:31:10.680000 52926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 52979 2025-12-04T10:13:47.8259543Z I1204 09:31:10.681000 52926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 52980 2025-12-04T10:13:47.8260436Z I1204 09:31:10.682000 52926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 52981 2025-12-04T10:13:47.8262729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8262961Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8265447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8265684Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8267863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8268076Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8271001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8271307Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8274138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8274425Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8276525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8276762Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8277861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:47.8278009Z self.encoder = TransformerEncoder( 2025-12-04T10:13:47.8279955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:47.8280124Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:47.8280594Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8281124Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8282127Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8282642Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8283626Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8284032Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8285085Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8285584Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8286541Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8287062Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8288027Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8288471Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8289434Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8289921Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8291735Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T10:13:47.8292094Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8292710Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8294031Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8294390Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8295119Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8295660Z [rank1]:E1204 09:31:17.414000 52979 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8296118Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8296648Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8297647Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8298160Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8299176Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8299582Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8300534Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8301027Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8302012Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8302500Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8303462Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8303901Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8304870Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8305356Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8307182Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T10:13:47.8307527Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8308109Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8309174Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8309493Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8310131Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8310612Z [rank0]:E1204 09:31:17.415000 52978 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8311016Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8311485Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8312367Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8312851Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8313724Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8314076Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8314917Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8315381Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8316225Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8316654Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8317505Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8317895Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8318756Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8319213Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8320772Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T10:13:47.8321089Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8321667Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8322737Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8323054Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8323688Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8324170Z [rank3]:E1204 09:31:17.415000 52981 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8324571Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8325038Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8325941Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8326399Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8327265Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8327619Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8328488Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8328922Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8329760Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8330185Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8331036Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8331458Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8332314Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8332779Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8334645Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 611254272 and is now 623837184. 2025-12-04T10:13:47.8335010Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8335664Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8336862Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8337222Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8337942Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8338483Z [rank2]:E1204 09:31:17.415000 52980 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8338590Z dist init r=2, world=4 2025-12-04T10:13:47.8338685Z dist init r=0, world=4 2025-12-04T10:13:47.8338777Z dist init r=3, world=4 2025-12-04T10:13:47.8338913Z dist init r=1, world=4 2025-12-04T10:13:47.8340074Z [rank0]:[W1204 09:31:17.917400365 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8340168Z FAILED [8.4338s] [100%] 2025-12-04T10:13:47.8340181Z 2025-12-04T10:13:47.8340322Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8340679Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _ 2025-12-04T10:13:47.8340802Z Traceback (most recent call last): 2025-12-04T10:13:47.8341375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8341487Z self._join_processes(fn) 2025-12-04T10:13:47.8342078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8342216Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8342818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8342926Z raise RuntimeError(error) 2025-12-04T10:13:47.8343154Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8343275Z Traceback (most recent call last): 2025-12-04T10:13:47.8343808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8343915Z getattr(self, test_name)() 2025-12-04T10:13:47.8344478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8344564Z fn() 2025-12-04T10:13:47.8345074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8345204Z method(*args, **kwargs) 2025-12-04T10:13:47.8345702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8345804Z method(*args, **kwargs) 2025-12-04T10:13:47.8346360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8346440Z with policy(): 2025-12-04T10:13:47.8346889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8346983Z raise RuntimeError(msg) 2025-12-04T10:13:47.8348115Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T10:13:47.8348124Z 2025-12-04T10:13:47.8348311Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8348971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8348976Z 2025-12-04T10:13:47.8349206Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8349211Z 2025-12-04T10:13:47.8349216Z 2025-12-04T10:13:47.8349408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8349640Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8350347Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2ea67fcde569130f.xml - 2025-12-04T10:13:47.8350523Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8351326Z FAILED [8.4338s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8351430Z Traceback (most recent call last): 2025-12-04T10:13:47.8351916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8352011Z getattr(self, test_name)() 2025-12-04T10:13:47.8352487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8352593Z fn() 2025-12-04T10:13:47.8353038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8353132Z method(*args, **kwargs) 2025-12-04T10:13:47.8353574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8353665Z method(*args, **kwargs) 2025-12-04T10:13:47.8354105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8354187Z with policy(): 2025-12-04T10:13:47.8354637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8354729Z raise RuntimeError(msg) 2025-12-04T10:13:47.8355848Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T10:13:47.8355882Z 2025-12-04T10:13:47.8356075Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8356759Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8356764Z 2025-12-04T10:13:47.8356998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8357154Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8357304Z ======================= 1 failed, 32 deselected in 8.65s ======================= 2025-12-04T10:13:47.8357392Z Got exit code 1 2025-12-04T10:13:47.8357984Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T10:13:47.8358345Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.8358893Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2525c9886ebe84d6.xml 2025-12-04T10:13:47.8359035Z ============================= test session starts ============================== 2025-12-04T10:13:47.8359345Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8359437Z cachedir: .pytest_cache 2025-12-04T10:13:47.8359891Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8359996Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8360088Z configfile: pytest.ini 2025-12-04T10:13:47.8360565Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8360752Z collecting ... collected 60 items / 4 deselected / 56 selected 2025-12-04T10:13:47.8360873Z stepcurrent: skipping 4 already run items. 2025-12-04T10:13:47.8361004Z Running 29 items in this shard 2025-12-04T10:13:47.8361011Z 2025-12-04T10:13:47.8361941Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda I1204 09:31:24.099000 53247 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 53299 2025-12-04T10:13:47.8362384Z I1204 09:31:24.100000 53247 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 53300 2025-12-04T10:13:47.8362816Z I1204 09:31:24.101000 53247 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 53301 2025-12-04T10:13:47.8363321Z I1204 09:31:24.102000 53247 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 53302 2025-12-04T10:13:47.8364858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8365057Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8366272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8366398Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8367323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8367443Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8368431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8368559Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8370549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8370648Z _warn_cpu_init() 2025-12-04T10:13:47.8372520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8372618Z _warn_cpu_init() 2025-12-04T10:13:47.8374852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8374957Z _warn_cpu_init() 2025-12-04T10:13:47.8376987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8377088Z _warn_cpu_init() 2025-12-04T10:13:47.8378085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8378303Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8379594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8379819Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8380802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8381025Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8382003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8382222Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8386672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8387174Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8387951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8388068Z return func(*args, **kwargs) 2025-12-04T10:13:47.8392555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8392946Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8393636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8393740Z return func(*args, **kwargs) 2025-12-04T10:13:47.8397742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8398101Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8398778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8398920Z return func(*args, **kwargs) 2025-12-04T10:13:47.8402867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8403242Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8403919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8404017Z return func(*args, **kwargs) 2025-12-04T10:13:47.8404683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8404781Z return func(*args, **kwargs) 2025-12-04T10:13:47.8405447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8405536Z return func(*args, **kwargs) 2025-12-04T10:13:47.8406208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8406298Z return func(*args, **kwargs) 2025-12-04T10:13:47.8406994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8407093Z return func(*args, **kwargs) 2025-12-04T10:13:47.8407969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.8408067Z return func(*args, **kwargs) 2025-12-04T10:13:47.8408473Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8408986Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8409869Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8410318Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8411190Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8411538Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8412386Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8412846Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8413968Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8414454Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8415403Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8415853Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8416811Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8417303Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8418971Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 707723264 and is now 783220736. 2025-12-04T10:13:47.8419337Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8419993Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8421162Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8421530Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8422247Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8422793Z [rank0]:E1204 09:31:55.073000 53299 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8423269Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8423795Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8424799Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8425301Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8426447Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8426828Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8427678Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8428132Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8428977Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8429415Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8430261Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8430661Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8431507Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8431942Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8433424Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:47.8433782Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8434366Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8435374Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8435700Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8436365Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8436864Z [rank1]:E1204 09:31:55.075000 53300 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8437262Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8437734Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8438618Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8439065Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8439969Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8440322Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8441201Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8441627Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8442464Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8442898Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8443737Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8444139Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8444983Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8445420Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8446934Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.8447268Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8447850Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8448852Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8449203Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8449839Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8450326Z [rank2]:E1204 09:31:55.075000 53301 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8450724Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8451194Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8452080Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8452558Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8453493Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8454091Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8455058Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8455537Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8456495Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8456984Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8457934Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8458377Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8459329Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8459819Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8461522Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:47.8461889Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8462538Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8463705Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8464078Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8464790Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8465337Z [rank3]:E1204 09:31:55.076000 53302 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8465433Z dist init r=2, world=4 2025-12-04T10:13:47.8465636Z dist init r=0, world=4 2025-12-04T10:13:47.8465735Z dist init r=3, world=4 2025-12-04T10:13:47.8465823Z dist init r=1, world=4 2025-12-04T10:13:47.8466967Z [rank0]:[W1204 09:31:55.586322520 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8467090Z FAILED [32.9532s] [ 3%] 2025-12-04T10:13:47.8467097Z 2025-12-04T10:13:47.8467226Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8467533Z __ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda ___ 2025-12-04T10:13:47.8467636Z Traceback (most recent call last): 2025-12-04T10:13:47.8468117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8468220Z self._join_processes(fn) 2025-12-04T10:13:47.8468731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8468860Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8469401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8469500Z raise RuntimeError(error) 2025-12-04T10:13:47.8469714Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.8469820Z Traceback (most recent call last): 2025-12-04T10:13:47.8470295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8470395Z getattr(self, test_name)() 2025-12-04T10:13:47.8470868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8470951Z fn() 2025-12-04T10:13:47.8471395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8471484Z method(*args, **kwargs) 2025-12-04T10:13:47.8471937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8472025Z method(*args, **kwargs) 2025-12-04T10:13:47.8472505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8472589Z with policy(): 2025-12-04T10:13:47.8473035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8473133Z raise RuntimeError(msg) 2025-12-04T10:13:47.8474210Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.8474216Z 2025-12-04T10:13:47.8474411Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8475048Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8475054Z 2025-12-04T10:13:47.8475286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8475293Z 2025-12-04T10:13:47.8475298Z 2025-12-04T10:13:47.8475496Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8475726Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8476437Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2525c9886ebe84d6.xml - 2025-12-04T10:13:47.8476587Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8477347Z FAILED [32.9532s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.8477485Z Traceback (most recent call last): 2025-12-04T10:13:47.8477971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8478103Z getattr(self, test_name)() 2025-12-04T10:13:47.8478574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8478798Z fn() 2025-12-04T10:13:47.8479451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8479556Z method(*args, **kwargs) 2025-12-04T10:13:47.8480054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8480162Z method(*args, **kwargs) 2025-12-04T10:13:47.8480668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8480773Z with policy(): 2025-12-04T10:13:47.8481279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8481384Z raise RuntimeError(msg) 2025-12-04T10:13:47.8482605Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.8482611Z 2025-12-04T10:13:47.8482823Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8483516Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8483523Z 2025-12-04T10:13:47.8483785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8483960Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8484213Z ======================= 1 failed, 4 deselected in 33.17s ======================= 2025-12-04T10:13:47.8484308Z Got exit code 1 2025-12-04T10:13:47.8484412Z Retrying single test... 2025-12-04T10:13:47.8485031Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-265cb7987b98bd4a.xml 2025-12-04T10:13:47.8485186Z ============================= test session starts ============================== 2025-12-04T10:13:47.8485534Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8485634Z cachedir: .pytest_cache 2025-12-04T10:13:47.8486181Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8486309Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8486411Z configfile: pytest.ini 2025-12-04T10:13:47.8486950Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8487162Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.8487924Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8488034Z Running 1 items in this shard 2025-12-04T10:13:47.8488040Z 2025-12-04T10:13:47.8489081Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda I1204 09:32:01.859000 53584 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 53636 2025-12-04T10:13:47.8489620Z I1204 09:32:01.860000 53584 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 53637 2025-12-04T10:13:47.8490106Z I1204 09:32:01.861000 53584 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 53638 2025-12-04T10:13:47.8490588Z I1204 09:32:01.862000 53584 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 53639 2025-12-04T10:13:47.8491702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8491816Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8492694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8492813Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8493924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8494052Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8495028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8495158Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8497166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8497270Z _warn_cpu_init() 2025-12-04T10:13:47.8499296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8499400Z _warn_cpu_init() 2025-12-04T10:13:47.8501775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8501888Z _warn_cpu_init() 2025-12-04T10:13:47.8503879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8503971Z _warn_cpu_init() 2025-12-04T10:13:47.8504967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8505215Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8506379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8506596Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8507471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8507657Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8508535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8508727Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8512688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8513068Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8513756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8513860Z return func(*args, **kwargs) 2025-12-04T10:13:47.8517873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8518233Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8518912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8519040Z return func(*args, **kwargs) 2025-12-04T10:13:47.8522985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8523363Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8524042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8524151Z return func(*args, **kwargs) 2025-12-04T10:13:47.8528131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8528484Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8529160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8529258Z return func(*args, **kwargs) 2025-12-04T10:13:47.8529923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8530053Z return func(*args, **kwargs) 2025-12-04T10:13:47.8530743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8530886Z return func(*args, **kwargs) 2025-12-04T10:13:47.8532069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8532237Z return func(*args, **kwargs) 2025-12-04T10:13:47.8533528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8533863Z return func(*args, **kwargs) 2025-12-04T10:13:47.8535694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.8535994Z return func(*args, **kwargs) 2025-12-04T10:13:47.8536781Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8537811Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8539604Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8540507Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8542390Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8543114Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8544987Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8546011Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8547725Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8548571Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8550455Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8551289Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8553149Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8554084Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8557214Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 714014720 and is now 783220736. 2025-12-04T10:13:47.8557852Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8558976Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8560980Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8561628Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8563195Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8564192Z [rank0]:E1204 09:32:27.476000 53636 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8565097Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8566036Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8567930Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8568820Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8570614Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8571205Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8572427Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8572900Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8574151Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8574727Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8575677Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8576132Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8577083Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8577620Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8579500Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.8579881Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8580539Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8581680Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8582121Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8582837Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8583426Z [rank1]:E1204 09:32:27.477000 53637 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8583878Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8584411Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8585413Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8585921Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8586912Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8587311Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8588268Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8588756Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8589757Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8590239Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8591201Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8591601Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8592491Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8592932Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8594406Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:47.8594731Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8595311Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8596349Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8596699Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8597333Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8597817Z [rank2]:E1204 09:32:27.478000 53638 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8598214Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8598689Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8599568Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8600012Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8600884Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8601233Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8602087Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8602551Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8603403Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8603838Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8604678Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8605106Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8605953Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8606390Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8607861Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 611254272 and is now 674168832. 2025-12-04T10:13:47.8608184Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8608876Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8609874Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8610221Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8610856Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8611347Z [rank3]:E1204 09:32:27.479000 53639 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8611438Z dist init r=3, world=4 2025-12-04T10:13:47.8611523Z dist init r=1, world=4 2025-12-04T10:13:47.8611614Z dist init r=0, world=4 2025-12-04T10:13:47.8611697Z dist init r=2, world=4 2025-12-04T10:13:47.8612736Z [rank0]:[W1204 09:32:27.996754735 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8612827Z FAILED [27.2405s] [100%] 2025-12-04T10:13:47.8612834Z 2025-12-04T10:13:47.8612964Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8613308Z __ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda ___ 2025-12-04T10:13:47.8613414Z Traceback (most recent call last): 2025-12-04T10:13:47.8614103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8614223Z self._join_processes(fn) 2025-12-04T10:13:47.8614840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8614994Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8615599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8615711Z raise RuntimeError(error) 2025-12-04T10:13:47.8615951Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.8616070Z Traceback (most recent call last): 2025-12-04T10:13:47.8616611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8616720Z getattr(self, test_name)() 2025-12-04T10:13:47.8617279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8617376Z fn() 2025-12-04T10:13:47.8617877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8617979Z method(*args, **kwargs) 2025-12-04T10:13:47.8618489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8618590Z method(*args, **kwargs) 2025-12-04T10:13:47.8619094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8619185Z with policy(): 2025-12-04T10:13:47.8619692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8619807Z raise RuntimeError(msg) 2025-12-04T10:13:47.8621028Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.8621068Z 2025-12-04T10:13:47.8621316Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8622008Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8622014Z 2025-12-04T10:13:47.8622274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8622280Z 2025-12-04T10:13:47.8622290Z 2025-12-04T10:13:47.8622507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8622764Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8623571Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-265cb7987b98bd4a.xml - 2025-12-04T10:13:47.8623738Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8624589Z FAILED [27.2405s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.8624717Z Traceback (most recent call last): 2025-12-04T10:13:47.8625257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8625371Z getattr(self, test_name)() 2025-12-04T10:13:47.8625968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8626043Z fn() 2025-12-04T10:13:47.8626497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8626589Z method(*args, **kwargs) 2025-12-04T10:13:47.8627061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8627153Z method(*args, **kwargs) 2025-12-04T10:13:47.8627594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8627686Z with policy(): 2025-12-04T10:13:47.8628135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8628225Z raise RuntimeError(msg) 2025-12-04T10:13:47.8629338Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.8629345Z 2025-12-04T10:13:47.8629536Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8630159Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8630166Z 2025-12-04T10:13:47.8630397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8630550Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8630714Z ====================== 1 failed, 32 deselected in 27.46s ======================= 2025-12-04T10:13:47.8630798Z Got exit code 1 2025-12-04T10:13:47.8630895Z Retrying single test... 2025-12-04T10:13:47.8631443Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0dad56460685f27c.xml 2025-12-04T10:13:47.8631614Z ============================= test session starts ============================== 2025-12-04T10:13:47.8631923Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8632018Z cachedir: .pytest_cache 2025-12-04T10:13:47.8632501Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8632607Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8632697Z configfile: pytest.ini 2025-12-04T10:13:47.8633339Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8633538Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.8634252Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8634363Z Running 1 items in this shard 2025-12-04T10:13:47.8634368Z 2025-12-04T10:13:47.8635350Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda I1204 09:32:34.120000 53921 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 53973 2025-12-04T10:13:47.8635817Z I1204 09:32:34.121000 53921 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 53974 2025-12-04T10:13:47.8636276Z I1204 09:32:34.121000 53921 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 53975 2025-12-04T10:13:47.8636733Z I1204 09:32:34.122000 53921 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 53976 2025-12-04T10:13:47.8637678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8637802Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8638756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8638878Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8639805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8639920Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8640831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8641008Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8642902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8643002Z _warn_cpu_init() 2025-12-04T10:13:47.8644870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8644990Z _warn_cpu_init() 2025-12-04T10:13:47.8646928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8647045Z _warn_cpu_init() 2025-12-04T10:13:47.8648809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8648892Z _warn_cpu_init() 2025-12-04T10:13:47.8649773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8649967Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8650842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8651032Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8651904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8652121Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8652993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8653238Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8657893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8658529Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8659879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8660080Z return func(*args, **kwargs) 2025-12-04T10:13:47.8664682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8665125Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8666002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8666112Z return func(*args, **kwargs) 2025-12-04T10:13:47.8670386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8670741Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8671421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8671523Z return func(*args, **kwargs) 2025-12-04T10:13:47.8675514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.8675869Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.8676548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8676675Z return func(*args, **kwargs) 2025-12-04T10:13:47.8677350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8677483Z return func(*args, **kwargs) 2025-12-04T10:13:47.8678153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8678248Z return func(*args, **kwargs) 2025-12-04T10:13:47.8679243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8679349Z return func(*args, **kwargs) 2025-12-04T10:13:47.8680111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8680214Z return func(*args, **kwargs) 2025-12-04T10:13:47.8681211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.8681324Z return func(*args, **kwargs) 2025-12-04T10:13:47.8681786Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8682322Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8683324Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8683902Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8684890Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8685283Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8686242Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8686764Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8687728Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8688212Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8689162Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8689609Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8690572Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8691211Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8692815Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T10:13:47.8693178Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8694012Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8695158Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8695524Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8696231Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8696778Z [rank0]:E1204 09:32:59.332000 53973 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8697227Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8697759Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8698789Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8699294Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8700275Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8700667Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8701649Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8702133Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8703096Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8703575Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8704525Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8704970Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8706052Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8706514Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8707989Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:47.8708315Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8708897Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8709903Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8710228Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8710857Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8711338Z [rank1]:E1204 09:32:59.332000 53974 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8711741Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8712256Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8713138Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8713584Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8714462Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8714837Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8715692Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8716123Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8716977Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8717399Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8718440Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8718892Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8719815Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8720281Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8721848Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:47.8722195Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8722806Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8723874Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8724217Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8724881Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8725398Z [rank2]:E1204 09:32:59.333000 53975 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8725849Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8726354Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8727291Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8727927Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8728916Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8729305Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8730333Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8730790Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8731691Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8732179Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8733070Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8733581Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8734710Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8735209Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8736877Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:47.8737253Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8737911Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8739043Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8739412Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8740123Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8740701Z [rank3]:E1204 09:32:59.333000 53976 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8740804Z dist init r=2, world=4 2025-12-04T10:13:47.8740898Z dist init r=3, world=4 2025-12-04T10:13:47.8741000Z dist init r=0, world=4 2025-12-04T10:13:47.8741095Z dist init r=1, world=4 2025-12-04T10:13:47.8742255Z [rank0]:[W1204 09:32:59.853028470 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8742356Z FAILED [27.1165s] [100%] 2025-12-04T10:13:47.8742367Z 2025-12-04T10:13:47.8742539Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8742859Z __ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda ___ 2025-12-04T10:13:47.8742976Z Traceback (most recent call last): 2025-12-04T10:13:47.8743522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8743638Z self._join_processes(fn) 2025-12-04T10:13:47.8744216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8744360Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8744959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8745069Z raise RuntimeError(error) 2025-12-04T10:13:47.8745309Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8745454Z Traceback (most recent call last): 2025-12-04T10:13:47.8746185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8746280Z getattr(self, test_name)() 2025-12-04T10:13:47.8746775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8746857Z fn() 2025-12-04T10:13:47.8747304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8747390Z method(*args, **kwargs) 2025-12-04T10:13:47.8747844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8747935Z method(*args, **kwargs) 2025-12-04T10:13:47.8748384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8748470Z with policy(): 2025-12-04T10:13:47.8748915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8749019Z raise RuntimeError(msg) 2025-12-04T10:13:47.8750097Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T10:13:47.8750103Z 2025-12-04T10:13:47.8750298Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8750903Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8750908Z 2025-12-04T10:13:47.8751142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8751146Z 2025-12-04T10:13:47.8751157Z 2025-12-04T10:13:47.8751348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8751603Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8752321Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0dad56460685f27c.xml - 2025-12-04T10:13:47.8752471Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8753224Z FAILED [27.1165s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.8753335Z Traceback (most recent call last): 2025-12-04T10:13:47.8753844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8753948Z getattr(self, test_name)() 2025-12-04T10:13:47.8754418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8754495Z fn() 2025-12-04T10:13:47.8754942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8755032Z method(*args, **kwargs) 2025-12-04T10:13:47.8755481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8755570Z method(*args, **kwargs) 2025-12-04T10:13:47.8756011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8756103Z with policy(): 2025-12-04T10:13:47.8756551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8756676Z raise RuntimeError(msg) 2025-12-04T10:13:47.8757756Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T10:13:47.8757802Z 2025-12-04T10:13:47.8757990Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8758600Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8758605Z 2025-12-04T10:13:47.8758837Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8758997Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8759158Z ====================== 1 failed, 32 deselected in 27.33s ======================= 2025-12-04T10:13:47.8759241Z Got exit code 1 2025-12-04T10:13:47.8759787Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda 2025-12-04T10:13:47.8760148Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.8760694Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0a57eff14e5fabd3.xml 2025-12-04T10:13:47.8760841Z ============================= test session starts ============================== 2025-12-04T10:13:47.8761146Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8761243Z cachedir: .pytest_cache 2025-12-04T10:13:47.8761700Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8761807Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8761904Z configfile: pytest.ini 2025-12-04T10:13:47.8762401Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8762591Z collecting ... collected 60 items / 5 deselected / 55 selected 2025-12-04T10:13:47.8762713Z stepcurrent: skipping 5 already run items. 2025-12-04T10:13:47.8762815Z Running 28 items in this shard 2025-12-04T10:13:47.8762820Z 2025-12-04T10:13:47.8763751Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda I1204 09:33:05.750000 54258 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 54310 2025-12-04T10:13:47.8764191Z I1204 09:33:05.750000 54258 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 54311 2025-12-04T10:13:47.8764652Z I1204 09:33:05.751000 54258 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 54312 2025-12-04T10:13:47.8765093Z I1204 09:33:05.752000 54258 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 54313 2025-12-04T10:13:47.8765977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8766104Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8766974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8767097Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8767968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8768109Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8768990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8769127Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8770924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8771013Z _warn_cpu_init() 2025-12-04T10:13:47.8772789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8772878Z _warn_cpu_init() 2025-12-04T10:13:47.8775033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8775142Z _warn_cpu_init() 2025-12-04T10:13:47.8777173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8777279Z _warn_cpu_init() 2025-12-04T10:13:47.8778268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8778525Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8779776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8780010Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8780996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8781212Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8782205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8782480Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8783484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.8783634Z return func(*args, **kwargs) 2025-12-04T10:13:47.8784404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8784520Z return func(*args, **kwargs) 2025-12-04T10:13:47.8785282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8785399Z return func(*args, **kwargs) 2025-12-04T10:13:47.8786158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8786268Z return func(*args, **kwargs) 2025-12-04T10:13:47.8787035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8787145Z return func(*args, **kwargs) 2025-12-04T10:13:47.8787910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8788015Z return func(*args, **kwargs) 2025-12-04T10:13:47.8788766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8788881Z return func(*args, **kwargs) 2025-12-04T10:13:47.8789639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8789741Z return func(*args, **kwargs) 2025-12-04T10:13:47.8790538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8790756Z return func(*args, **kwargs) 2025-12-04T10:13:47.8791173Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8791642Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8792555Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8793009Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8793882Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8794243Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8795090Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8795529Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8796410Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8796842Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8797731Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8798124Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8798978Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8799413Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8800895Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:47.8801219Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8801810Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8802820Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8803173Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8803813Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8804296Z [rank1]:E1204 09:33:43.236000 54311 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8804698Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8805164Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8806073Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8806534Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8807400Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8807756Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8808599Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8809082Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8809933Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8810386Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8811239Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8811628Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8812491Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8812927Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8814742Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 720306176 and is now 760152064. 2025-12-04T10:13:47.8815103Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8815766Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8816938Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8817304Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8818027Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8818570Z [rank0]:E1204 09:33:43.238000 54310 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8819119Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8819651Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8820644Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8821159Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8822150Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8822559Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8823548Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8824070Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8825019Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8825498Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8826506Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8826904Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8827761Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8828200Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8829683Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:47.8830004Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8830615Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8831625Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8831944Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8832581Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8833091Z [rank2]:E1204 09:33:43.239000 54312 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8833499Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8833966Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8834849Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8835300Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8836170Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8836557Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8837404Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8837867Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8838707Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8839134Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8839994Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8840384Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8841289Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8841995Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8844756Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:47.8845379Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8846435Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8848165Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8848757Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8849990Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8851046Z [rank3]:E1204 09:33:43.239000 54313 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8851235Z dist init r=2, world=4 2025-12-04T10:13:47.8851398Z dist init r=1, world=4 2025-12-04T10:13:47.8851552Z dist init r=3, world=4 2025-12-04T10:13:47.8851705Z dist init r=0, world=4 2025-12-04T10:13:47.8854163Z [rank0]:[W1204 09:33:43.775973716 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8854348Z FAILED [39.7265s] [ 3%] 2025-12-04T10:13:47.8854368Z 2025-12-04T10:13:47.8854624Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8855288Z ___ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda ___ 2025-12-04T10:13:47.8855501Z Traceback (most recent call last): 2025-12-04T10:13:47.8856484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8856767Z self._join_processes(fn) 2025-12-04T10:13:47.8857870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8858119Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8859276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8859482Z raise RuntimeError(error) 2025-12-04T10:13:47.8859921Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.8860160Z Traceback (most recent call last): 2025-12-04T10:13:47.8861217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8861418Z getattr(self, test_name)() 2025-12-04T10:13:47.8862420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8862575Z fn() 2025-12-04T10:13:47.8863520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8863705Z method(*args, **kwargs) 2025-12-04T10:13:47.8864634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8864827Z method(*args, **kwargs) 2025-12-04T10:13:47.8865812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8865988Z with policy(): 2025-12-04T10:13:47.8867089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8867348Z raise RuntimeError(msg) 2025-12-04T10:13:47.8869370Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:47.8869388Z 2025-12-04T10:13:47.8869745Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8870891Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8870911Z 2025-12-04T10:13:47.8871342Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8871355Z 2025-12-04T10:13:47.8871708Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.8871910Z Traceback (most recent call last): 2025-12-04T10:13:47.8872828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8872992Z getattr(self, test_name)() 2025-12-04T10:13:47.8873906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8874051Z fn() 2025-12-04T10:13:47.8874933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8875091Z method(*args, **kwargs) 2025-12-04T10:13:47.8875884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8876055Z method(*args, **kwargs) 2025-12-04T10:13:47.8876961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8877116Z with policy(): 2025-12-04T10:13:47.8877903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8878117Z raise RuntimeError(msg) 2025-12-04T10:13:47.8879566Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:47.8879575Z 2025-12-04T10:13:47.8879793Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8880486Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8880498Z 2025-12-04T10:13:47.8880767Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8880773Z 2025-12-04T10:13:47.8880777Z 2025-12-04T10:13:47.8881003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8881272Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8882074Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0a57eff14e5fabd3.xml - 2025-12-04T10:13:47.8882249Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8883098Z FAILED [39.7265s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.8883216Z Traceback (most recent call last): 2025-12-04T10:13:47.8883784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8883896Z getattr(self, test_name)() 2025-12-04T10:13:47.8884913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8885009Z fn() 2025-12-04T10:13:47.8885514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8885629Z method(*args, **kwargs) 2025-12-04T10:13:47.8886134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8886235Z method(*args, **kwargs) 2025-12-04T10:13:47.8886746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8886845Z with policy(): 2025-12-04T10:13:47.8887407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8887514Z raise RuntimeError(msg) 2025-12-04T10:13:47.8888717Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:47.8888725Z 2025-12-04T10:13:47.8888946Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8889627Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8889633Z 2025-12-04T10:13:47.8889901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8889906Z 2025-12-04T10:13:47.8890111Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.8890231Z Traceback (most recent call last): 2025-12-04T10:13:47.8890790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8890938Z getattr(self, test_name)() 2025-12-04T10:13:47.8891585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8891779Z fn() 2025-12-04T10:13:47.8892251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8892353Z method(*args, **kwargs) 2025-12-04T10:13:47.8892825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8892922Z method(*args, **kwargs) 2025-12-04T10:13:47.8893483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8893578Z with policy(): 2025-12-04T10:13:47.8894258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8894367Z raise RuntimeError(msg) 2025-12-04T10:13:47.8895567Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:47.8895582Z 2025-12-04T10:13:47.8895798Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8896477Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8896485Z 2025-12-04T10:13:47.8896759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8896940Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.8897151Z ======================= 1 failed, 5 deselected in 39.94s ======================= 2025-12-04T10:13:47.8897260Z Got exit code 1 2025-12-04T10:13:47.8897361Z Retrying single test... 2025-12-04T10:13:47.8897988Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f9981ec75d7ffd49.xml 2025-12-04T10:13:47.8898149Z ============================= test session starts ============================== 2025-12-04T10:13:47.8898492Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.8898607Z cachedir: .pytest_cache 2025-12-04T10:13:47.8899120Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.8899271Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.8899384Z configfile: pytest.ini 2025-12-04T10:13:47.8899920Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.8900141Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.8900898Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8901011Z Running 1 items in this shard 2025-12-04T10:13:47.8901016Z 2025-12-04T10:13:47.8902067Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda I1204 09:33:50.029000 54595 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 54647 2025-12-04T10:13:47.8902567Z I1204 09:33:50.030000 54595 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 54648 2025-12-04T10:13:47.8903092Z I1204 09:33:50.031000 54595 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 54649 2025-12-04T10:13:47.8903584Z I1204 09:33:50.032000 54595 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 54650 2025-12-04T10:13:47.8904615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8904750Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8905831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8906065Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8906938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8907060Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8907926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8908035Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.8909819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8909907Z _warn_cpu_init() 2025-12-04T10:13:47.8911708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8911796Z _warn_cpu_init() 2025-12-04T10:13:47.8913596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8913684Z _warn_cpu_init() 2025-12-04T10:13:47.8915448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.8915534Z _warn_cpu_init() 2025-12-04T10:13:47.8916413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8916641Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8917514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8917742Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8918616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8918816Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8919687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.8919874Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.8920757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.8920854Z return func(*args, **kwargs) 2025-12-04T10:13:47.8921545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8921638Z return func(*args, **kwargs) 2025-12-04T10:13:47.8922310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8922417Z return func(*args, **kwargs) 2025-12-04T10:13:47.8923087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8923234Z return func(*args, **kwargs) 2025-12-04T10:13:47.8923904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.8923997Z return func(*args, **kwargs) 2025-12-04T10:13:47.8924670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8924762Z return func(*args, **kwargs) 2025-12-04T10:13:47.8925439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8925562Z return func(*args, **kwargs) 2025-12-04T10:13:47.8926231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8926335Z return func(*args, **kwargs) 2025-12-04T10:13:47.8927003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.8927100Z return func(*args, **kwargs) 2025-12-04T10:13:47.8927516Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8927987Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8928881Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8929360Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8930267Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8930616Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8931456Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8931900Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8932750Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8933238Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8934322Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8934775Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8935744Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8936269Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8937945Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T10:13:47.8938307Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8938971Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8940142Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8940512Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8941225Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8941772Z [rank0]:E1204 09:34:23.524000 54647 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.8942230Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8942761Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8943791Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8944323Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8945322Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8945820Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8946794Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8947235Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8948081Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8948516Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8949363Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8949764Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8950641Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8951075Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8952545Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:47.8952888Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8953486Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8954491Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8954829Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8955461Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8955940Z [rank1]:E1204 09:34:23.524000 54648 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.8956376Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8956845Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8957760Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8958205Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8959080Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8959435Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8960286Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8960721Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8961558Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8961995Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8962840Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8963261Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8964117Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8964546Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8966043Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:47.8966369Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8966960Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8967962Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8968287Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8968922Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8969432Z [rank3]:E1204 09:34:23.525000 54650 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.8969836Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.8970425Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.8971962Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8972773Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.8973985Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8974390Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.8975349Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8975840Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8976796Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8977289Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.8978311Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8978965Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.8979924Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8980414Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.8982165Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:47.8982543Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8983206Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8984337Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8984704Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.8985461Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8986049Z [rank2]:E1204 09:34:23.525000 54649 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.8986159Z dist init r=1, world=4 2025-12-04T10:13:47.8986254Z dist init r=0, world=4 2025-12-04T10:13:47.8986350Z dist init r=3, world=4 2025-12-04T10:13:47.8986448Z dist init r=2, world=4 2025-12-04T10:13:47.8987612Z [rank0]:[W1204 09:34:23.046848307 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.8987719Z FAILED [35.1425s] [100%] 2025-12-04T10:13:47.8987729Z 2025-12-04T10:13:47.8987880Z =================================== FAILURES =================================== 2025-12-04T10:13:47.8988188Z ___ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda ___ 2025-12-04T10:13:47.8988314Z Traceback (most recent call last): 2025-12-04T10:13:47.8988865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.8988982Z self._join_processes(fn) 2025-12-04T10:13:47.8989569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.8989707Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.8990320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.8990430Z raise RuntimeError(error) 2025-12-04T10:13:47.8990783Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.8990890Z Traceback (most recent call last): 2025-12-04T10:13:47.8991400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8991510Z getattr(self, test_name)() 2025-12-04T10:13:47.8991979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8992059Z fn() 2025-12-04T10:13:47.8992507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8992596Z method(*args, **kwargs) 2025-12-04T10:13:47.8993046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.8993135Z method(*args, **kwargs) 2025-12-04T10:13:47.8993604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.8993697Z with policy(): 2025-12-04T10:13:47.8994145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.8994241Z raise RuntimeError(msg) 2025-12-04T10:13:47.8995318Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:47.8995324Z 2025-12-04T10:13:47.8995514Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.8996127Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.8996160Z 2025-12-04T10:13:47.8996392Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.8996397Z 2025-12-04T10:13:47.8996401Z 2025-12-04T10:13:47.8996604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.8996843Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.8997579Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f9981ec75d7ffd49.xml - 2025-12-04T10:13:47.8997736Z =========================== short test summary info ============================ 2025-12-04T10:13:47.8998477Z FAILED [35.1425s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.8998592Z Traceback (most recent call last): 2025-12-04T10:13:47.8999079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.8999180Z getattr(self, test_name)() 2025-12-04T10:13:47.8999661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.8999739Z fn() 2025-12-04T10:13:47.9000185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9000279Z method(*args, **kwargs) 2025-12-04T10:13:47.9000720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9000815Z method(*args, **kwargs) 2025-12-04T10:13:47.9001256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9001338Z with policy(): 2025-12-04T10:13:47.9001789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9001883Z raise RuntimeError(msg) 2025-12-04T10:13:47.9002989Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:47.9002997Z 2025-12-04T10:13:47.9003188Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9003791Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9003797Z 2025-12-04T10:13:47.9004033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9004188Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9004376Z ====================== 1 failed, 32 deselected in 35.36s ======================= 2025-12-04T10:13:47.9004464Z Got exit code 1 2025-12-04T10:13:47.9004556Z Retrying single test... 2025-12-04T10:13:47.9005113Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9f763e8043031072.xml 2025-12-04T10:13:47.9005254Z ============================= test session starts ============================== 2025-12-04T10:13:47.9005559Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9005658Z cachedir: .pytest_cache 2025-12-04T10:13:47.9006116Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9006230Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9006321Z configfile: pytest.ini 2025-12-04T10:13:47.9006793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9007015Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.9007688Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9007824Z Running 1 items in this shard 2025-12-04T10:13:47.9007829Z 2025-12-04T10:13:47.9008749Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda I1204 09:34:30.119000 54932 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 54984 2025-12-04T10:13:47.9009185Z I1204 09:34:30.120000 54932 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 54985 2025-12-04T10:13:47.9009624Z I1204 09:34:30.121000 54932 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 54986 2025-12-04T10:13:47.9010059Z I1204 09:34:30.122000 54932 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 54987 2025-12-04T10:13:47.9010957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9011081Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9011956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9012076Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9012947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9013064Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9014267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9014406Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9016419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9016517Z _warn_cpu_init() 2025-12-04T10:13:47.9018558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9018661Z _warn_cpu_init() 2025-12-04T10:13:47.9020669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9020812Z _warn_cpu_init() 2025-12-04T10:13:47.9022819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9022943Z _warn_cpu_init() 2025-12-04T10:13:47.9023941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9024159Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9025149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9025377Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9026507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9026703Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9027570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9027773Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9028648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9028772Z return func(*args, **kwargs) 2025-12-04T10:13:47.9029461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9029556Z return func(*args, **kwargs) 2025-12-04T10:13:47.9030240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9030333Z return func(*args, **kwargs) 2025-12-04T10:13:47.9031006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9031136Z return func(*args, **kwargs) 2025-12-04T10:13:47.9031803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9031898Z return func(*args, **kwargs) 2025-12-04T10:13:47.9032570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9032662Z return func(*args, **kwargs) 2025-12-04T10:13:47.9033337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9033427Z return func(*args, **kwargs) 2025-12-04T10:13:47.9034095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9034219Z return func(*args, **kwargs) 2025-12-04T10:13:47.9034891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9035013Z return func(*args, **kwargs) 2025-12-04T10:13:47.9035420Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9035890Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9036776Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9037225Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9038106Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9038459Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9039309Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9039739Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9040584Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9041114Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9041964Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9042361Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9043210Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9043670Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9045142Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 711917568 and is now 760152064. 2025-12-04T10:13:47.9045463Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9046047Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9047057Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9047410Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9048046Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9048564Z [rank0]:E1204 09:35:03.181000 54984 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9048963Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9049428Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9050317Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9050768Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9051651Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9052003Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9052849Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9053345Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9054469Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9054957Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9055905Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9056351Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9057344Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9057838Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9059501Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:47.9059856Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9060521Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9061680Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9062074Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9062787Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9063332Z [rank1]:E1204 09:35:03.181000 54985 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9063778Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9064309Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9065306Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9065808Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9066798Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9067146Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9068175Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9068691Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9069624Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9070090Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9071013Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9071459Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9089139Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9089727Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9091522Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:47.9092030Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9092743Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9094031Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9094476Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9095189Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9095741Z [rank2]:E1204 09:35:03.181000 54986 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9096205Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9096740Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9097758Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9098267Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9099263Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9099665Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9100670Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9101175Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9102132Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9102628Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9103632Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9104091Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9105052Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9105652Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9107271Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:47.9107631Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9108246Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9109272Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9109604Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9110234Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9110753Z [rank3]:E1204 09:35:03.182000 54987 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9110860Z dist init r=2, world=4 2025-12-04T10:13:47.9110971Z dist init r=1, world=4 2025-12-04T10:13:47.9111097Z dist init r=3, world=4 2025-12-04T10:13:47.9111253Z dist init r=0, world=4 2025-12-04T10:13:47.9115462Z [rank0]:[W1204 09:35:03.700231258 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9115570Z FAILED [35.3595s] [100%] 2025-12-04T10:13:47.9115577Z 2025-12-04T10:13:47.9115720Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9116017Z ___ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda ___ 2025-12-04T10:13:47.9116129Z Traceback (most recent call last): 2025-12-04T10:13:47.9116692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9116797Z self._join_processes(fn) 2025-12-04T10:13:47.9117361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9117491Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9118058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9118170Z raise RuntimeError(error) 2025-12-04T10:13:47.9118389Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.9118499Z Traceback (most recent call last): 2025-12-04T10:13:47.9119041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9119146Z getattr(self, test_name)() 2025-12-04T10:13:47.9119659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9119744Z fn() 2025-12-04T10:13:47.9120224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9120348Z method(*args, **kwargs) 2025-12-04T10:13:47.9121118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9121288Z method(*args, **kwargs) 2025-12-04T10:13:47.9122168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9122325Z with policy(): 2025-12-04T10:13:47.9123446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9123624Z raise RuntimeError(msg) 2025-12-04T10:13:47.9125803Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:47.9125894Z 2025-12-04T10:13:47.9126261Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9127381Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9127395Z 2025-12-04T10:13:47.9127847Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9127857Z 2025-12-04T10:13:47.9128134Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.9128343Z Traceback (most recent call last): 2025-12-04T10:13:47.9129304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9129496Z getattr(self, test_name)() 2025-12-04T10:13:47.9130420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9130572Z fn() 2025-12-04T10:13:47.9131481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9131655Z method(*args, **kwargs) 2025-12-04T10:13:47.9132640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9132829Z method(*args, **kwargs) 2025-12-04T10:13:47.9133805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9134157Z with policy(): 2025-12-04T10:13:47.9135104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9135411Z raise RuntimeError(msg) 2025-12-04T10:13:47.9137706Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:47.9137719Z 2025-12-04T10:13:47.9138120Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9139448Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9139460Z 2025-12-04T10:13:47.9140037Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9140052Z 2025-12-04T10:13:47.9140061Z 2025-12-04T10:13:47.9140514Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9141017Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9142555Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9f763e8043031072.xml - 2025-12-04T10:13:47.9142863Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9144470Z FAILED [35.3595s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.9144702Z Traceback (most recent call last): 2025-12-04T10:13:47.9145798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9146092Z getattr(self, test_name)() 2025-12-04T10:13:47.9147015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9147167Z fn() 2025-12-04T10:13:47.9148154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9148343Z method(*args, **kwargs) 2025-12-04T10:13:47.9149244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9149440Z method(*args, **kwargs) 2025-12-04T10:13:47.9150349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9150532Z with policy(): 2025-12-04T10:13:47.9151570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9151748Z raise RuntimeError(msg) 2025-12-04T10:13:47.9153888Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:47.9153907Z 2025-12-04T10:13:47.9154293Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9155563Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9155574Z 2025-12-04T10:13:47.9156019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9156027Z 2025-12-04T10:13:47.9156280Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.9156497Z Traceback (most recent call last): 2025-12-04T10:13:47.9157450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9157651Z getattr(self, test_name)() 2025-12-04T10:13:47.9158717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9158831Z fn() 2025-12-04T10:13:47.9159487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9159584Z method(*args, **kwargs) 2025-12-04T10:13:47.9160054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9160160Z method(*args, **kwargs) 2025-12-04T10:13:47.9160628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9160768Z with policy(): 2025-12-04T10:13:47.9161242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9161346Z raise RuntimeError(msg) 2025-12-04T10:13:47.9162491Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:47.9162501Z 2025-12-04T10:13:47.9162703Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9163356Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9163362Z 2025-12-04T10:13:47.9163611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9163817Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9163995Z ====================== 1 failed, 32 deselected in 35.58s ======================= 2025-12-04T10:13:47.9164087Z Got exit code 1 2025-12-04T10:13:47.9164678Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda 2025-12-04T10:13:47.9165089Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.9165670Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-02dc996acd3ff226.xml 2025-12-04T10:13:47.9165833Z ============================= test session starts ============================== 2025-12-04T10:13:47.9166159Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9166262Z cachedir: .pytest_cache 2025-12-04T10:13:47.9166758Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9166873Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9166984Z configfile: pytest.ini 2025-12-04T10:13:47.9167485Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9167682Z collecting ... collected 60 items / 6 deselected / 54 selected 2025-12-04T10:13:47.9167825Z stepcurrent: skipping 6 already run items. 2025-12-04T10:13:47.9167930Z Running 27 items in this shard 2025-12-04T10:13:47.9167936Z 2025-12-04T10:13:47.9168938Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda I1204 09:35:09.900000 55269 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 55321 2025-12-04T10:13:47.9169414Z I1204 09:35:09.901000 55269 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 55322 2025-12-04T10:13:47.9169880Z I1204 09:35:09.901000 55269 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 55323 2025-12-04T10:13:47.9170383Z I1204 09:35:09.902000 55269 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 55324 2025-12-04T10:13:47.9172272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9172372Z _warn_cpu_init() 2025-12-04T10:13:47.9174595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9174709Z _warn_cpu_init() 2025-12-04T10:13:47.9176698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9176834Z _warn_cpu_init() 2025-12-04T10:13:47.9179051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9179236Z _warn_cpu_init() 2025-12-04T10:13:47.9180225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9180334Z return func(*args, **kwargs) 2025-12-04T10:13:47.9180805Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9181341Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9182345Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9182851Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9183830Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9184232Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9185187Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9185718Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9186681Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9187171Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9188130Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9188610Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9189573Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9190061Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9191833Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T10:13:47.9192196Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9192789Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9193837Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9194160Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9194799Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9195282Z [rank0]:E1204 09:35:45.989000 55321 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9195695Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9196160Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9197048Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9197493Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9198362Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9198722Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9199595Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9200031Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9200874Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9201313Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9202183Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9202580Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9203437Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9203869Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9205363Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:47.9205713Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9206345Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9207366Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9207686Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9208326Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9208812Z [rank1]:E1204 09:35:45.990000 55322 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9209222Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9209689Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9210577Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9211024Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9211924Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9212285Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9213129Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9213826Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9214816Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9215309Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9216263Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9216707Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9217673Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9218164Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9219880Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:47.9220270Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9220940Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9222091Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9222455Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9223180Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9223722Z [rank2]:E1204 09:35:45.991000 55323 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9224178Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9224700Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9225705Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9226281Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9227247Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9227610Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9228453Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9228889Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9229761Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9230204Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9231053Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9231447Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9232713Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9233609Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9235547Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:47.9235956Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9236584Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9237682Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9238021Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9238702Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9239211Z [rank3]:E1204 09:35:45.991000 55324 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9239314Z dist init r=3, world=4 2025-12-04T10:13:47.9239403Z dist init r=2, world=4 2025-12-04T10:13:47.9239492Z dist init r=1, world=4 2025-12-04T10:13:47.9239588Z dist init r=0, world=4 2025-12-04T10:13:47.9240671Z [rank0]:[W1204 09:35:46.526556886 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9240805Z FAILED [37.8503s] [ 3%] 2025-12-04T10:13:47.9240815Z 2025-12-04T10:13:47.9240951Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9241258Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:47.9241381Z Traceback (most recent call last): 2025-12-04T10:13:47.9241897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9241998Z self._join_processes(fn) 2025-12-04T10:13:47.9242552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9242714Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9243284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9243390Z raise RuntimeError(error) 2025-12-04T10:13:47.9243716Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.9243824Z Traceback (most recent call last): 2025-12-04T10:13:47.9244300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9244399Z getattr(self, test_name)() 2025-12-04T10:13:47.9244874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9244950Z fn() 2025-12-04T10:13:47.9245400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9245524Z method(*args, **kwargs) 2025-12-04T10:13:47.9245968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9246065Z method(*args, **kwargs) 2025-12-04T10:13:47.9246510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9246627Z with policy(): 2025-12-04T10:13:47.9247071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9247164Z raise RuntimeError(msg) 2025-12-04T10:13:47.9248258Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:47.9248266Z 2025-12-04T10:13:47.9248456Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9249087Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9249095Z 2025-12-04T10:13:47.9249327Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9249332Z 2025-12-04T10:13:47.9249337Z 2025-12-04T10:13:47.9249531Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9249767Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9250472Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-02dc996acd3ff226.xml - 2025-12-04T10:13:47.9250626Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9251400Z FAILED [37.8503s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.9251534Z Traceback (most recent call last): 2025-12-04T10:13:47.9252029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9252128Z getattr(self, test_name)() 2025-12-04T10:13:47.9252606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9252682Z fn() 2025-12-04T10:13:47.9253124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9253315Z method(*args, **kwargs) 2025-12-04T10:13:47.9253947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9254090Z method(*args, **kwargs) 2025-12-04T10:13:47.9254604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9254701Z with policy(): 2025-12-04T10:13:47.9255216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9255321Z raise RuntimeError(msg) 2025-12-04T10:13:47.9256548Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:47.9256563Z 2025-12-04T10:13:47.9256778Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9257488Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9257548Z 2025-12-04T10:13:47.9257817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9257994Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9258664Z ======================= 1 failed, 6 deselected in 38.07s ======================= 2025-12-04T10:13:47.9258767Z Got exit code 1 2025-12-04T10:13:47.9258870Z Retrying single test... 2025-12-04T10:13:47.9259500Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a9525f3a3720890d.xml 2025-12-04T10:13:47.9259658Z ============================= test session starts ============================== 2025-12-04T10:13:47.9260005Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9260124Z cachedir: .pytest_cache 2025-12-04T10:13:47.9260647Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9260765Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9260875Z configfile: pytest.ini 2025-12-04T10:13:47.9261408Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9261629Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.9262411Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9262519Z Running 1 items in this shard 2025-12-04T10:13:47.9262525Z 2025-12-04T10:13:47.9263598Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda I1204 09:35:52.610000 55606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 55658 2025-12-04T10:13:47.9264094Z I1204 09:35:52.611000 55606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 55659 2025-12-04T10:13:47.9264616Z I1204 09:35:52.611000 55606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 55660 2025-12-04T10:13:47.9265104Z I1204 09:35:52.612000 55606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 55661 2025-12-04T10:13:47.9267076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9267198Z _warn_cpu_init() 2025-12-04T10:13:47.9268969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9269054Z _warn_cpu_init() 2025-12-04T10:13:47.9270828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9270948Z _warn_cpu_init() 2025-12-04T10:13:47.9272715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9272833Z _warn_cpu_init() 2025-12-04T10:13:47.9273706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9273811Z return func(*args, **kwargs) 2025-12-04T10:13:47.9274219Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9274693Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9275592Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9276041Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9276929Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9277281Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9278161Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9278748Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9279848Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9280342Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9281358Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9281817Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9282772Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9283271Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9284959Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:47.9285362Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9286025Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9287216Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9287586Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9288297Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9288848Z [rank2]:E1204 09:36:25.977000 55660 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9289301Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9289825Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9290828Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9291433Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9292433Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9292820Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9293925Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9294411Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9295362Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9295892Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9296843Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9297301Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9298260Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9298757Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9300438Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:47.9300858Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9301518Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9302675Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9303045Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9303766Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9304317Z [rank0]:E1204 09:36:25.977000 55658 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9304766Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9305291Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9306460Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9306910Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9307809Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9308163Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9309019Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9309446Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9310330Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9310771Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9311615Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9312012Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9312858Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9313326Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9314822Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:47.9315166Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9315756Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9316780Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9317114Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9317746Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9318238Z [rank3]:E1204 09:36:25.977000 55661 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9318636Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9319103Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9319994Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9320464Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9321341Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9321696Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9322552Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9323010Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9323855Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9324287Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9325132Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9325530Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9326425Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9326865Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9328378Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:47.9328706Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9329287Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9330311Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9330641Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9331271Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9331761Z [rank1]:E1204 09:36:25.977000 55659 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9331852Z dist init r=3, world=4 2025-12-04T10:13:47.9331939Z dist init r=1, world=4 2025-12-04T10:13:47.9332039Z dist init r=0, world=4 2025-12-04T10:13:47.9332123Z dist init r=2, world=4 2025-12-04T10:13:47.9333171Z [rank0]:[W1204 09:36:26.488170002 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9333324Z FAILED [34.7319s] [100%] 2025-12-04T10:13:47.9333331Z 2025-12-04T10:13:47.9333460Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9333940Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:47.9334062Z Traceback (most recent call last): 2025-12-04T10:13:47.9334604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9334724Z self._join_processes(fn) 2025-12-04T10:13:47.9335339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9335488Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9336095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9336205Z raise RuntimeError(error) 2025-12-04T10:13:47.9336448Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.9336563Z Traceback (most recent call last): 2025-12-04T10:13:47.9337113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9337225Z getattr(self, test_name)() 2025-12-04T10:13:47.9337753Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9337850Z fn() 2025-12-04T10:13:47.9338387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9338485Z method(*args, **kwargs) 2025-12-04T10:13:47.9339001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9339130Z method(*args, **kwargs) 2025-12-04T10:13:47.9339640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9339734Z with policy(): 2025-12-04T10:13:47.9340239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9340353Z raise RuntimeError(msg) 2025-12-04T10:13:47.9341580Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:47.9341589Z 2025-12-04T10:13:47.9341810Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9342514Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9342522Z 2025-12-04T10:13:47.9342780Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9342786Z 2025-12-04T10:13:47.9342801Z 2025-12-04T10:13:47.9343018Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9343273Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9344074Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a9525f3a3720890d.xml - 2025-12-04T10:13:47.9344247Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9345143Z FAILED [34.7319s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:47.9345275Z Traceback (most recent call last): 2025-12-04T10:13:47.9345819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9346037Z getattr(self, test_name)() 2025-12-04T10:13:47.9346507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9346584Z fn() 2025-12-04T10:13:47.9347036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9347127Z method(*args, **kwargs) 2025-12-04T10:13:47.9347601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9347691Z method(*args, **kwargs) 2025-12-04T10:13:47.9348133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9348225Z with policy(): 2025-12-04T10:13:47.9348672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9348768Z raise RuntimeError(msg) 2025-12-04T10:13:47.9349868Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:47.9349874Z 2025-12-04T10:13:47.9350094Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9350719Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9350727Z 2025-12-04T10:13:47.9350987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9351148Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9351303Z ====================== 1 failed, 32 deselected in 34.95s ======================= 2025-12-04T10:13:47.9351388Z Got exit code 1 2025-12-04T10:13:47.9351485Z Retrying single test... 2025-12-04T10:13:47.9352037Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0add77cc4faf0004.xml 2025-12-04T10:13:47.9352177Z ============================= test session starts ============================== 2025-12-04T10:13:47.9352490Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9352582Z cachedir: .pytest_cache 2025-12-04T10:13:47.9353043Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9353148Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9353237Z configfile: pytest.ini 2025-12-04T10:13:47.9353713Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9353900Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.9354587Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9354691Z Running 1 items in this shard 2025-12-04T10:13:47.9354696Z 2025-12-04T10:13:47.9355635Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda I1204 09:36:32.239000 55943 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 55995 2025-12-04T10:13:47.9356125Z I1204 09:36:32.240000 55943 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 55996 2025-12-04T10:13:47.9356563Z I1204 09:36:32.241000 55943 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 55997 2025-12-04T10:13:47.9357003Z I1204 09:36:32.242000 55943 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 55998 2025-12-04T10:13:47.9358820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9358908Z _warn_cpu_init() 2025-12-04T10:13:47.9360688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9360775Z _warn_cpu_init() 2025-12-04T10:13:47.9362550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9362666Z _warn_cpu_init() 2025-12-04T10:13:47.9364461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9364546Z _warn_cpu_init() 2025-12-04T10:13:47.9365432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9365531Z return func(*args, **kwargs) 2025-12-04T10:13:47.9365936Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9366420Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9367305Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9367765Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9368637Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9368994Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9369868Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9370302Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9371155Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9371609Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9372461Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9372854Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9373966Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9374457Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9376136Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 718209024 and is now 734986240. 2025-12-04T10:13:47.9376545Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9377242Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9378403Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9378964Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9379701Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9380247Z [rank0]:E1204 09:37:04.507000 55995 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9380699Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9381237Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9382233Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9382751Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9383794Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9384198Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9385149Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9385634Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9386641Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9387128Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9388095Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9388535Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9389496Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9389987Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9391803Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:47.9392164Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9392738Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9393767Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9394084Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9394723Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9395203Z [rank2]:E1204 09:37:04.508000 55997 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9395596Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9396064Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9396942Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9397414Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9398280Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9398635Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9399479Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9399931Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9400790Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9401214Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9402064Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9402454Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9403307Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9403765Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9405287Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:47.9405611Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9406185Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9407219Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9407539Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9408173Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9408651Z [rank3]:E1204 09:37:04.508000 55998 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9409048Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9409526Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9410432Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9410883Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9411753Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9412106Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9412977Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9413470Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9414578Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9415060Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9416018Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9416494Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9417454Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9417975Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9419660Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:47.9420031Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9420687Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9421845Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9422211Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9422931Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9423473Z [rank1]:E1204 09:37:04.509000 55996 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9423580Z dist init r=2, world=4 2025-12-04T10:13:47.9423685Z dist init r=0, world=4 2025-12-04T10:13:47.9423781Z dist init r=1, world=4 2025-12-04T10:13:47.9423876Z dist init r=3, world=4 2025-12-04T10:13:47.9425071Z [rank0]:[W1204 09:37:04.019747704 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9425177Z FAILED [33.7559s] [100%] 2025-12-04T10:13:47.9425183Z 2025-12-04T10:13:47.9425330Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9425649Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:47.9425768Z Traceback (most recent call last): 2025-12-04T10:13:47.9426378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9426505Z self._join_processes(fn) 2025-12-04T10:13:47.9427033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9427158Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9427694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9427798Z raise RuntimeError(error) 2025-12-04T10:13:47.9428005Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.9428109Z Traceback (most recent call last): 2025-12-04T10:13:47.9428593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9428689Z getattr(self, test_name)() 2025-12-04T10:13:47.9429169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9429272Z fn() 2025-12-04T10:13:47.9429720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9429823Z method(*args, **kwargs) 2025-12-04T10:13:47.9430352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9430443Z method(*args, **kwargs) 2025-12-04T10:13:47.9430897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9430985Z with policy(): 2025-12-04T10:13:47.9431442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9431538Z raise RuntimeError(msg) 2025-12-04T10:13:47.9432629Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 718209024 and is now 734986240. 2025-12-04T10:13:47.9432643Z 2025-12-04T10:13:47.9432836Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9433462Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9433467Z 2025-12-04T10:13:47.9433710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9433715Z 2025-12-04T10:13:47.9433719Z 2025-12-04T10:13:47.9433913Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9434153Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9434860Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0add77cc4faf0004.xml - 2025-12-04T10:13:47.9435009Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9435804Z FAILED [33.7559s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.9435910Z Traceback (most recent call last): 2025-12-04T10:13:47.9436398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9436493Z getattr(self, test_name)() 2025-12-04T10:13:47.9436966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9437047Z fn() 2025-12-04T10:13:47.9437511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9437604Z method(*args, **kwargs) 2025-12-04T10:13:47.9438053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9438145Z method(*args, **kwargs) 2025-12-04T10:13:47.9438595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9438676Z with policy(): 2025-12-04T10:13:47.9439119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9439219Z raise RuntimeError(msg) 2025-12-04T10:13:47.9440307Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 718209024 and is now 734986240. 2025-12-04T10:13:47.9440337Z 2025-12-04T10:13:47.9440536Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9441159Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9441191Z 2025-12-04T10:13:47.9441425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9441590Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9441745Z ====================== 1 failed, 32 deselected in 33.97s ======================= 2025-12-04T10:13:47.9441836Z Got exit code 1 2025-12-04T10:13:47.9442381Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda 2025-12-04T10:13:47.9442740Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.9443289Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb044786f28290de.xml 2025-12-04T10:13:47.9443430Z ============================= test session starts ============================== 2025-12-04T10:13:47.9443747Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9443841Z cachedir: .pytest_cache 2025-12-04T10:13:47.9444293Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9444406Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9444499Z configfile: pytest.ini 2025-12-04T10:13:47.9444971Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9445168Z collecting ... collected 60 items / 7 deselected / 53 selected 2025-12-04T10:13:47.9445289Z stepcurrent: skipping 7 already run items. 2025-12-04T10:13:47.9445395Z Running 26 items in this shard 2025-12-04T10:13:47.9445399Z 2025-12-04T10:13:47.9446383Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda I1204 09:37:10.930000 56280 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 56332 2025-12-04T10:13:47.9446824Z I1204 09:37:10.931000 56280 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 56333 2025-12-04T10:13:47.9447269Z I1204 09:37:10.931000 56280 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 56334 2025-12-04T10:13:47.9447697Z I1204 09:37:10.932000 56280 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 56335 2025-12-04T10:13:47.9448674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9448847Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9450421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9450646Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9454199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9454600Z _warn_cpu_init() 2025-12-04T10:13:47.9458179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9458496Z _warn_cpu_init() 2025-12-04T10:13:47.9460392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9460780Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9462604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9462998Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9464866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9465104Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9467085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9467324Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9471298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9471503Z _warn_cpu_init() 2025-12-04T10:13:47.9474840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9475032Z _warn_cpu_init() 2025-12-04T10:13:47.9476809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9477193Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9479195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9479622Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9487167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9487771Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9492357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9492719Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9493519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9493805Z return func(*args, **kwargs) 2025-12-04T10:13:47.9494615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9494727Z return func(*args, **kwargs) 2025-12-04T10:13:47.9499242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9499646Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9500410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9500521Z return func(*args, **kwargs) 2025-12-04T10:13:47.9504994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9505453Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9506387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9506493Z return func(*args, **kwargs) 2025-12-04T10:13:47.9507162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9507258Z return func(*args, **kwargs) 2025-12-04T10:13:47.9507929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9508024Z return func(*args, **kwargs) 2025-12-04T10:13:47.9508687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9508790Z return func(*args, **kwargs) 2025-12-04T10:13:47.9509482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9509587Z return func(*args, **kwargs) 2025-12-04T10:13:47.9510467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9510558Z return func(*args, **kwargs) 2025-12-04T10:13:47.9510978Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9511447Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9512367Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9512820Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9513706Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9514058Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9514903Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9515371Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9516213Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9516679Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9517527Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9517927Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9518782Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9519215Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9520729Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:47.9521051Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9521639Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9522694Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9523026Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9523657Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9524137Z [rank1]:E1204 09:37:18.660000 56333 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9524587Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9525054Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9525946Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9526392Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9527268Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9527616Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9528488Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9528929Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9529801Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9530236Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9531079Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9531479Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9532326Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9532756Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9534639Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 707723264 and is now 783220736. 2025-12-04T10:13:47.9535005Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9535698Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9536857Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9537224Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9537940Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9538508Z [rank0]:E1204 09:37:18.660000 56332 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9538970Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9539496Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9540503Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9541008Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9542001Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9542428Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9543383Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9543908Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9544864Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9545360Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9546471Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9546868Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9547720Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9548154Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9549659Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:47.9550012Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9550596Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9551617Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9551946Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9552609Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9553090Z [rank3]:E1204 09:37:18.663000 56335 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9553493Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9553963Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9554845Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9555292Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9556196Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9556570Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9557412Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9557850Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9558695Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9559136Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9559980Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9560380Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9561228Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9561660Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9563205Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 156160 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.9563531Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9564118Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9565174Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9565507Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9566142Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9566625Z [rank2]:E1204 09:37:18.669000 56334 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9566725Z dist init r=0, world=4 2025-12-04T10:13:47.9566814Z dist init r=3, world=4 2025-12-04T10:13:47.9566901Z dist init r=2, world=4 2025-12-04T10:13:47.9566992Z dist init r=1, world=4 2025-12-04T10:13:47.9568014Z [rank0]:[W1204 09:37:19.185797847 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9568136Z FAILED [9.5325s] [ 3%] 2025-12-04T10:13:47.9568143Z 2025-12-04T10:13:47.9568272Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9568557Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda _ 2025-12-04T10:13:47.9568701Z Traceback (most recent call last): 2025-12-04T10:13:47.9569185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9569291Z self._join_processes(fn) 2025-12-04T10:13:47.9569812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9569932Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9570484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9570586Z raise RuntimeError(error) 2025-12-04T10:13:47.9570789Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.9570909Z Traceback (most recent call last): 2025-12-04T10:13:47.9571385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9571494Z getattr(self, test_name)() 2025-12-04T10:13:47.9571966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9572044Z fn() 2025-12-04T10:13:47.9572499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9572587Z method(*args, **kwargs) 2025-12-04T10:13:47.9573040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9573132Z method(*args, **kwargs) 2025-12-04T10:13:47.9573816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9573977Z with policy(): 2025-12-04T10:13:47.9574484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9574595Z raise RuntimeError(msg) 2025-12-04T10:13:47.9575851Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 707723264 and is now 783220736. 2025-12-04T10:13:47.9575858Z 2025-12-04T10:13:47.9576070Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9576819Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9576828Z 2025-12-04T10:13:47.9577090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9577098Z 2025-12-04T10:13:47.9577103Z 2025-12-04T10:13:47.9577333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9577594Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9578387Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb044786f28290de.xml - 2025-12-04T10:13:47.9578559Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9579627Z FAILED [9.5325s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.9579832Z Traceback (most recent call last): 2025-12-04T10:13:47.9580380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9580493Z getattr(self, test_name)() 2025-12-04T10:13:47.9581076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9581164Z fn() 2025-12-04T10:13:47.9581668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9581778Z method(*args, **kwargs) 2025-12-04T10:13:47.9582423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9582595Z method(*args, **kwargs) 2025-12-04T10:13:47.9583529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9583680Z with policy(): 2025-12-04T10:13:47.9584490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9584612Z raise RuntimeError(msg) 2025-12-04T10:13:47.9585868Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 707723264 and is now 783220736. 2025-12-04T10:13:47.9585876Z 2025-12-04T10:13:47.9586089Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9586801Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9586807Z 2025-12-04T10:13:47.9587083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9587263Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9587538Z ======================= 1 failed, 7 deselected in 9.75s ======================== 2025-12-04T10:13:47.9587638Z Got exit code 1 2025-12-04T10:13:47.9587739Z Retrying single test... 2025-12-04T10:13:47.9588371Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3368174cbc9b5a.xml 2025-12-04T10:13:47.9588530Z ============================= test session starts ============================== 2025-12-04T10:13:47.9588876Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9588995Z cachedir: .pytest_cache 2025-12-04T10:13:47.9589511Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9589690Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9589796Z configfile: pytest.ini 2025-12-04T10:13:47.9590330Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9590552Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.9591405Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9591521Z Running 1 items in this shard 2025-12-04T10:13:47.9591527Z 2025-12-04T10:13:47.9592526Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda I1204 09:37:25.389000 56617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 56669 2025-12-04T10:13:47.9592996Z I1204 09:37:25.390000 56617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 56670 2025-12-04T10:13:47.9593598Z I1204 09:37:25.391000 56617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 56671 2025-12-04T10:13:47.9594036Z I1204 09:37:25.392000 56617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 56672 2025-12-04T10:13:47.9594959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9595080Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9596870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9596958Z _warn_cpu_init() 2025-12-04T10:13:47.9597847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9598055Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9598924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9599043Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9599913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9600029Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9600932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9601047Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9602819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9602934Z _warn_cpu_init() 2025-12-04T10:13:47.9604702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9604792Z _warn_cpu_init() 2025-12-04T10:13:47.9606562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9606672Z _warn_cpu_init() 2025-12-04T10:13:47.9607553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9607781Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9608872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9609080Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9610008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9610221Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9614720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9615171Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9615943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9616067Z return func(*args, **kwargs) 2025-12-04T10:13:47.9620580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9620986Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9625450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9626304Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9626989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9627099Z return func(*args, **kwargs) 2025-12-04T10:13:47.9631054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9631408Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9632113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9632222Z return func(*args, **kwargs) 2025-12-04T10:13:47.9632899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9633003Z return func(*args, **kwargs) 2025-12-04T10:13:47.9633677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9633768Z return func(*args, **kwargs) 2025-12-04T10:13:47.9634472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9634565Z return func(*args, **kwargs) 2025-12-04T10:13:47.9635238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9635334Z return func(*args, **kwargs) 2025-12-04T10:13:47.9636001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9636103Z return func(*args, **kwargs) 2025-12-04T10:13:47.9636983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9637114Z return func(*args, **kwargs) 2025-12-04T10:13:47.9637526Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9637997Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9638917Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9639364Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9640250Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9640784Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9641685Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9642152Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9643053Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9643515Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9644438Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9644865Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9645758Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9646215Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9647842Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 716111872 and is now 783220736. 2025-12-04T10:13:47.9648185Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9648811Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9649888Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9650237Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9650938Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9651452Z [rank0]:E1204 09:37:33.163000 56669 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9651908Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9652405Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9653425Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9654085Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9655078Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9655475Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9656431Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9656918Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9657874Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9658362Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9659347Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9659798Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9660753Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9661263Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9662967Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:47.9663331Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9663995Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9665148Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9665719Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9666484Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9666997Z [rank1]:E1204 09:37:33.164000 56670 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9667407Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9667875Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9668761Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9669214Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9670095Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9670445Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9671293Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9671737Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9672608Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9673044Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9673889Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9674288Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9675161Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9675596Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9677088Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 611254272 and is now 674168832. 2025-12-04T10:13:47.9677412Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9677999Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9679440Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9679875Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9680587Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9681130Z [rank2]:E1204 09:37:33.165000 56671 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9681589Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9682116Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9683120Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9683625Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9684613Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9685004Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9685960Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9686491Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9687445Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9687931Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9688881Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9689362Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9690324Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9690809Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9692586Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:47.9692909Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9693593Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9694918Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9695318Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9696033Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9696576Z [rank3]:E1204 09:37:33.165000 56672 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9696685Z dist init r=1, world=4 2025-12-04T10:13:47.9696784Z dist init r=0, world=4 2025-12-04T10:13:47.9696887Z dist init r=3, world=4 2025-12-04T10:13:47.9696984Z dist init r=2, world=4 2025-12-04T10:13:47.9698140Z [rank0]:[W1204 09:37:33.673594872 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9698246Z FAILED [9.6951s] [100%] 2025-12-04T10:13:47.9698253Z 2025-12-04T10:13:47.9698395Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9698713Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda _ 2025-12-04T10:13:47.9698843Z Traceback (most recent call last): 2025-12-04T10:13:47.9699388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9699516Z self._join_processes(fn) 2025-12-04T10:13:47.9700098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9700267Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9700879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9700991Z raise RuntimeError(error) 2025-12-04T10:13:47.9701237Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.9701352Z Traceback (most recent call last): 2025-12-04T10:13:47.9701889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9702005Z getattr(self, test_name)() 2025-12-04T10:13:47.9702563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9702654Z fn() 2025-12-04T10:13:47.9703164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9703268Z method(*args, **kwargs) 2025-12-04T10:13:47.9703781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9703884Z method(*args, **kwargs) 2025-12-04T10:13:47.9704384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9704484Z with policy(): 2025-12-04T10:13:47.9704987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9705091Z raise RuntimeError(msg) 2025-12-04T10:13:47.9706398Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:47.9706438Z 2025-12-04T10:13:47.9706630Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9707291Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9707297Z 2025-12-04T10:13:47.9707529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9707534Z 2025-12-04T10:13:47.9707538Z 2025-12-04T10:13:47.9707742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9707973Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9708679Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3368174cbc9b5a.xml - 2025-12-04T10:13:47.9708841Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9709618Z FAILED [9.6951s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.9709734Z Traceback (most recent call last): 2025-12-04T10:13:47.9710218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9710315Z getattr(self, test_name)() 2025-12-04T10:13:47.9710797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9710876Z fn() 2025-12-04T10:13:47.9711330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9711422Z method(*args, **kwargs) 2025-12-04T10:13:47.9711891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9711993Z method(*args, **kwargs) 2025-12-04T10:13:47.9712435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9712520Z with policy(): 2025-12-04T10:13:47.9712976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9713068Z raise RuntimeError(msg) 2025-12-04T10:13:47.9714199Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:47.9714207Z 2025-12-04T10:13:47.9714396Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9715025Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9715041Z 2025-12-04T10:13:47.9715273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9715429Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9715592Z ======================= 1 failed, 32 deselected in 9.91s ======================= 2025-12-04T10:13:47.9715672Z Got exit code 1 2025-12-04T10:13:47.9715762Z Retrying single test... 2025-12-04T10:13:47.9716325Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48241fc1d70a928.xml 2025-12-04T10:13:47.9716493Z ============================= test session starts ============================== 2025-12-04T10:13:47.9716806Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9716904Z cachedir: .pytest_cache 2025-12-04T10:13:47.9717359Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9717505Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9717597Z configfile: pytest.ini 2025-12-04T10:13:47.9718064Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9718262Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.9718948Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9719057Z Running 1 items in this shard 2025-12-04T10:13:47.9719064Z 2025-12-04T10:13:47.9720019Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda I1204 09:37:39.860000 56954 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 57006 2025-12-04T10:13:47.9720455Z I1204 09:37:39.861000 56954 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 57007 2025-12-04T10:13:47.9720899Z I1204 09:37:39.861000 56954 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 57008 2025-12-04T10:13:47.9721328Z I1204 09:37:39.862000 56954 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 57009 2025-12-04T10:13:47.9722226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9722348Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9724165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9724418Z _warn_cpu_init() 2025-12-04T10:13:47.9725350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9725567Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9726534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9726664Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9727579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9727699Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9728627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9728746Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:47.9730631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9730782Z _warn_cpu_init() 2025-12-04T10:13:47.9732665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9732759Z _warn_cpu_init() 2025-12-04T10:13:47.9734949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9735046Z _warn_cpu_init() 2025-12-04T10:13:47.9736034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9736261Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9737246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9737472Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9738489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:47.9738717Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:47.9743224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9743631Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9744403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9744520Z return func(*args, **kwargs) 2025-12-04T10:13:47.9748974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9749525Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9753765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9754121Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9758126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:47.9758471Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:47.9759163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9759260Z return func(*args, **kwargs) 2025-12-04T10:13:47.9759934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9760039Z return func(*args, **kwargs) 2025-12-04T10:13:47.9760712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:47.9760844Z return func(*args, **kwargs) 2025-12-04T10:13:47.9761517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9761636Z return func(*args, **kwargs) 2025-12-04T10:13:47.9762310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9762401Z return func(*args, **kwargs) 2025-12-04T10:13:47.9763080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9763172Z return func(*args, **kwargs) 2025-12-04T10:13:47.9763843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:47.9763946Z return func(*args, **kwargs) 2025-12-04T10:13:47.9764828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9764937Z return func(*args, **kwargs) 2025-12-04T10:13:47.9765344Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9765822Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9766721Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9767201Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9768077Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9768429Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9769275Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9769744Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9770597Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9771035Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9772076Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9772795Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9774700Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9775671Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9778974Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 718209024 and is now 783220736. 2025-12-04T10:13:47.9779781Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9780966Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9783090Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9783823Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9785101Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9786128Z [rank0]:E1204 09:37:47.499000 57006 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9786929Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9787940Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9790030Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9791132Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9793022Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9793712Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9795562Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9796447Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9798147Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9799040Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9800812Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9801659Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9803688Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9804537Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9807383Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.9807993Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9809035Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9810347Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9810708Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9811380Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9811897Z [rank2]:E1204 09:37:47.500000 57008 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9812329Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9812911Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9814168Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9814677Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9815684Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9816129Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9817093Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9817585Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9818545Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9819037Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9820001Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9820485Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9821471Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9821954Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9823651Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 611254272 and is now 674168832. 2025-12-04T10:13:47.9824015Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9824680Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9825839Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9826293Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9826928Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9827412Z [rank1]:E1204 09:37:47.500000 57007 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9827847Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9828319Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9829206Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9829652Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9830558Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9830913Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9831759Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9832192Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9833036Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9833502Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9834344Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9834771Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9835620Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9836050Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9837560Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:47.9837881Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9838468Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9839490Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9839820Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9840483Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9840962Z [rank3]:E1204 09:37:47.501000 57009 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9841067Z dist init r=3, world=4 2025-12-04T10:13:47.9841155Z dist init r=2, world=4 2025-12-04T10:13:47.9841249Z dist init r=1, world=4 2025-12-04T10:13:47.9841335Z dist init r=0, world=4 2025-12-04T10:13:47.9842359Z [rank0]:[W1204 09:37:47.014148405 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9842456Z FAILED [9.8142s] [100%] 2025-12-04T10:13:47.9842465Z 2025-12-04T10:13:47.9842625Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9842919Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda _ 2025-12-04T10:13:47.9843026Z Traceback (most recent call last): 2025-12-04T10:13:47.9843511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9843614Z self._join_processes(fn) 2025-12-04T10:13:47.9844132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9844256Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9844799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9844899Z raise RuntimeError(error) 2025-12-04T10:13:47.9845144Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.9845249Z Traceback (most recent call last): 2025-12-04T10:13:47.9845730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9845882Z getattr(self, test_name)() 2025-12-04T10:13:47.9846351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9846429Z fn() 2025-12-04T10:13:47.9846881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9846974Z method(*args, **kwargs) 2025-12-04T10:13:47.9847427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9847517Z method(*args, **kwargs) 2025-12-04T10:13:47.9847964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9848059Z with policy(): 2025-12-04T10:13:47.9848508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9848603Z raise RuntimeError(msg) 2025-12-04T10:13:47.9849712Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 718209024 and is now 783220736. 2025-12-04T10:13:47.9849718Z 2025-12-04T10:13:47.9849912Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9850542Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9850549Z 2025-12-04T10:13:47.9850785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9850791Z 2025-12-04T10:13:47.9850942Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.9851078Z Traceback (most recent call last): 2025-12-04T10:13:47.9851556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9851664Z getattr(self, test_name)() 2025-12-04T10:13:47.9852134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9852223Z fn() 2025-12-04T10:13:47.9852664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9852753Z method(*args, **kwargs) 2025-12-04T10:13:47.9853278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9853409Z method(*args, **kwargs) 2025-12-04T10:13:47.9854048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9854155Z with policy(): 2025-12-04T10:13:47.9854663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9854777Z raise RuntimeError(msg) 2025-12-04T10:13:47.9856014Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.9856020Z 2025-12-04T10:13:47.9856229Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9856943Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9856982Z 2025-12-04T10:13:47.9857241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9857249Z 2025-12-04T10:13:47.9857254Z 2025-12-04T10:13:47.9857507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9857767Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9858569Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48241fc1d70a928.xml - 2025-12-04T10:13:47.9858739Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9859613Z FAILED [9.8142s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:47.9859749Z Traceback (most recent call last): 2025-12-04T10:13:47.9860296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9860403Z getattr(self, test_name)() 2025-12-04T10:13:47.9860947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9861034Z fn() 2025-12-04T10:13:47.9861543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9861645Z method(*args, **kwargs) 2025-12-04T10:13:47.9862147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9862263Z method(*args, **kwargs) 2025-12-04T10:13:47.9862760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9862865Z with policy(): 2025-12-04T10:13:47.9863395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9863501Z raise RuntimeError(msg) 2025-12-04T10:13:47.9864759Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 718209024 and is now 783220736. 2025-12-04T10:13:47.9864765Z 2025-12-04T10:13:47.9864974Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9865686Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9865692Z 2025-12-04T10:13:47.9866074Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9866080Z 2025-12-04T10:13:47.9866223Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:47.9866341Z Traceback (most recent call last): 2025-12-04T10:13:47.9866823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9866935Z getattr(self, test_name)() 2025-12-04T10:13:47.9867407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9867484Z fn() 2025-12-04T10:13:47.9867933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9868023Z method(*args, **kwargs) 2025-12-04T10:13:47.9868466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9868596Z method(*args, **kwargs) 2025-12-04T10:13:47.9869033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9869129Z with policy(): 2025-12-04T10:13:47.9869580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9869704Z raise RuntimeError(msg) 2025-12-04T10:13:47.9870812Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:47.9870818Z 2025-12-04T10:13:47.9871006Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9871643Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9871650Z 2025-12-04T10:13:47.9871879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9872039Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9872206Z ====================== 1 failed, 32 deselected in 10.03s ======================= 2025-12-04T10:13:47.9872292Z Got exit code 1 2025-12-04T10:13:47.9872850Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda 2025-12-04T10:13:47.9873211Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:47.9873759Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4808bec29186d3a1.xml 2025-12-04T10:13:47.9873914Z ============================= test session starts ============================== 2025-12-04T10:13:47.9874221Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9874324Z cachedir: .pytest_cache 2025-12-04T10:13:47.9874809Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9874917Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9875018Z configfile: pytest.ini 2025-12-04T10:13:47.9875491Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9875679Z collecting ... collected 60 items / 8 deselected / 52 selected 2025-12-04T10:13:47.9875816Z stepcurrent: skipping 8 already run items. 2025-12-04T10:13:47.9875915Z Running 25 items in this shard 2025-12-04T10:13:47.9875920Z 2025-12-04T10:13:47.9876898Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda I1204 09:37:54.199000 57291 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 57343 2025-12-04T10:13:47.9877342Z I1204 09:37:54.200000 57291 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 57344 2025-12-04T10:13:47.9877781Z I1204 09:37:54.201000 57291 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 57345 2025-12-04T10:13:47.9878225Z I1204 09:37:54.202000 57291 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 57346 2025-12-04T10:13:47.9880497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9880678Z _warn_cpu_init() 2025-12-04T10:13:47.9882673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9882821Z _warn_cpu_init() 2025-12-04T10:13:47.9884815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9884924Z _warn_cpu_init() 2025-12-04T10:13:47.9886925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9887031Z _warn_cpu_init() 2025-12-04T10:13:47.9888027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9888139Z return func(*args, **kwargs) 2025-12-04T10:13:47.9888611Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9889180Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9890186Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9890690Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9891936Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9892301Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9893535Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9894546Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9895782Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9896290Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9897359Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9897802Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9898813Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9899308Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9900997Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 714014720 and is now 758054912. 2025-12-04T10:13:47.9901363Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9902031Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9903175Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9903531Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9904264Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9904842Z [rank0]:E1204 09:38:40.052000 57343 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:47.9905306Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9905939Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9906829Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9907276Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9908175Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9908541Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9909384Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9909819Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9910667Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9911132Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9911980Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9912403Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9913263Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9913698Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9915194Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T10:13:47.9915518Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9916104Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9917115Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9917441Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9918111Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9918601Z [rank1]:E1204 09:38:40.053000 57344 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:47.9919007Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9919473Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9920391Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9920838Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9921704Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9922064Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9922915Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9923355Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9924229Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9924693Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9925542Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9925931Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9926785Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9927217Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9928713Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:47.9929036Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9929620Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9930629Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9930976Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9931620Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9932100Z [rank2]:E1204 09:38:40.053000 57345 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:47.9932507Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9932980Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9934201Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9934716Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9935705Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9936113Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9937072Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9937596Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9938555Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9939066Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9940029Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9940475Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9941444Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9941934Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9943606Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:47.9943969Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9944641Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9945904Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9946344Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9946987Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9947464Z [rank3]:E1204 09:38:40.053000 57346 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9947564Z dist init r=0, world=4 2025-12-04T10:13:47.9947690Z dist init r=3, world=4 2025-12-04T10:13:47.9947779Z dist init r=2, world=4 2025-12-04T10:13:47.9947877Z dist init r=1, world=4 2025-12-04T10:13:47.9948904Z [rank0]:[W1204 09:38:40.561564844 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:47.9949005Z FAILED [47.3679s] [ 4%] 2025-12-04T10:13:47.9949011Z 2025-12-04T10:13:47.9949141Z =================================== FAILURES =================================== 2025-12-04T10:13:47.9949417Z __ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda ___ 2025-12-04T10:13:47.9949530Z Traceback (most recent call last): 2025-12-04T10:13:47.9950012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:47.9950138Z self._join_processes(fn) 2025-12-04T10:13:47.9950671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:47.9950797Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:47.9951342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:47.9951469Z raise RuntimeError(error) 2025-12-04T10:13:47.9951676Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.9951789Z Traceback (most recent call last): 2025-12-04T10:13:47.9952266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9952364Z getattr(self, test_name)() 2025-12-04T10:13:47.9953005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9953093Z fn() 2025-12-04T10:13:47.9953578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9953676Z method(*args, **kwargs) 2025-12-04T10:13:47.9954150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9954255Z method(*args, **kwargs) 2025-12-04T10:13:47.9954723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9954810Z with policy(): 2025-12-04T10:13:47.9955297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9955396Z raise RuntimeError(msg) 2025-12-04T10:13:47.9956557Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:47.9956566Z 2025-12-04T10:13:47.9956792Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9957452Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9957458Z 2025-12-04T10:13:47.9957709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9957714Z 2025-12-04T10:13:47.9957719Z 2025-12-04T10:13:47.9957923Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:47.9958178Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:47.9958954Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4808bec29186d3a1.xml - 2025-12-04T10:13:47.9959128Z =========================== short test summary info ============================ 2025-12-04T10:13:47.9959936Z FAILED [47.3679s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:47.9960047Z Traceback (most recent call last): 2025-12-04T10:13:47.9960573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9960674Z getattr(self, test_name)() 2025-12-04T10:13:47.9961184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9961265Z fn() 2025-12-04T10:13:47.9961738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9961876Z method(*args, **kwargs) 2025-12-04T10:13:47.9962347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9962445Z method(*args, **kwargs) 2025-12-04T10:13:47.9962923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9963044Z with policy(): 2025-12-04T10:13:47.9963527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9963629Z raise RuntimeError(msg) 2025-12-04T10:13:47.9964775Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:47.9964794Z 2025-12-04T10:13:47.9964998Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9965645Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9965653Z 2025-12-04T10:13:47.9965908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9966075Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:47.9966239Z ======================= 1 failed, 8 deselected in 47.59s ======================= 2025-12-04T10:13:47.9966337Z Got exit code 1 2025-12-04T10:13:47.9966431Z Retrying single test... 2025-12-04T10:13:47.9967026Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0fe450ea21eea83e.xml 2025-12-04T10:13:47.9967172Z ============================= test session starts ============================== 2025-12-04T10:13:47.9967500Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:47.9967715Z cachedir: .pytest_cache 2025-12-04T10:13:47.9968197Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:47.9968306Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:47.9968407Z configfile: pytest.ini 2025-12-04T10:13:47.9968879Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:47.9969083Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:47.9969759Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9969855Z Running 1 items in this shard 2025-12-04T10:13:47.9969863Z 2025-12-04T10:13:47.9970837Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda I1204 09:38:46.270000 57628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 57680 2025-12-04T10:13:47.9971281Z I1204 09:38:46.270000 57628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 57681 2025-12-04T10:13:47.9971728Z I1204 09:38:46.271000 57628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 57682 2025-12-04T10:13:47.9972156Z I1204 09:38:46.272000 57628 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 57683 2025-12-04T10:13:47.9974237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9974376Z _warn_cpu_init() 2025-12-04T10:13:47.9976381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9976520Z _warn_cpu_init() 2025-12-04T10:13:47.9978518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9978816Z _warn_cpu_init() 2025-12-04T10:13:47.9980815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:47.9980919Z _warn_cpu_init() 2025-12-04T10:13:47.9981917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:47.9982037Z return func(*args, **kwargs) 2025-12-04T10:13:47.9982566Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9983105Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9984116Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9984624Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:47.9986071Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:47.9986474Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:47.9987448Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9987933Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9988882Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:47.9989373Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:47.9990374Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:47.9990952Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:47.9991795Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:47.9992236Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:47.9993719Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T10:13:47.9994042Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9994629Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:47.9995632Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:47.9995960Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:47.9996602Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:47.9997136Z [rank3]:E1204 09:39:32.266000 57683 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:47.9997537Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:47.9998007Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:47.9998903Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:47.9999376Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0000264Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0000619Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0001470Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0001905Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0002750Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0003213Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0004087Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0004484Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0005326Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0005766Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0007249Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 714014720 and is now 758054912. 2025-12-04T10:13:48.0007572Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0008154Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0009159Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0009487Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0010141Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0010630Z [rank0]:E1204 09:39:32.266000 57680 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0011027Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0011492Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0012408Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0012859Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0014005Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0014403Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0015373Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0015855Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0016843Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0017363Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0018313Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0018762Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0019718Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0020206Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0021888Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.0022245Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0022906Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0024078Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0024450Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0025164Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0025817Z [rank1]:E1204 09:39:32.266000 57681 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0026332Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0026826Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0027722Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0028175Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0029051Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0029400Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0030247Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0030713Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0031582Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0032017Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0032855Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0033260Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0034111Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0034542Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0036032Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.0036352Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0036941Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0037968Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0038301Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0038936Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0039424Z [rank2]:E1204 09:39:32.267000 57682 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0039545Z dist init r=0, world=4 2025-12-04T10:13:48.0039631Z dist init r=2, world=4 2025-12-04T10:13:48.0039727Z dist init r=3, world=4 2025-12-04T10:13:48.0039813Z dist init r=1, world=4 2025-12-04T10:13:48.0040830Z [rank0]:[W1204 09:39:32.780168288 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0040931Z FAILED [47.7405s] [100%] 2025-12-04T10:13:48.0040936Z 2025-12-04T10:13:48.0041063Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0041348Z __ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda ___ 2025-12-04T10:13:48.0041455Z Traceback (most recent call last): 2025-12-04T10:13:48.0041936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0042075Z self._join_processes(fn) 2025-12-04T10:13:48.0042588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0042711Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0043298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0043395Z raise RuntimeError(error) 2025-12-04T10:13:48.0043609Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0043710Z Traceback (most recent call last): 2025-12-04T10:13:48.0044182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0044289Z getattr(self, test_name)() 2025-12-04T10:13:48.0044757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0044843Z fn() 2025-12-04T10:13:48.0045286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0045378Z method(*args, **kwargs) 2025-12-04T10:13:48.0045834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0045928Z method(*args, **kwargs) 2025-12-04T10:13:48.0046370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0046463Z with policy(): 2025-12-04T10:13:48.0046914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0047018Z raise RuntimeError(msg) 2025-12-04T10:13:48.0048104Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T10:13:48.0048137Z 2025-12-04T10:13:48.0048331Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0048951Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0048956Z 2025-12-04T10:13:48.0049189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0049193Z 2025-12-04T10:13:48.0049198Z 2025-12-04T10:13:48.0049394Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0049627Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0050360Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0fe450ea21eea83e.xml - 2025-12-04T10:13:48.0050520Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0051282Z FAILED [47.7405s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0051391Z Traceback (most recent call last): 2025-12-04T10:13:48.0051873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0051973Z getattr(self, test_name)() 2025-12-04T10:13:48.0060672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0060813Z fn() 2025-12-04T10:13:48.0061375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0061565Z method(*args, **kwargs) 2025-12-04T10:13:48.0062083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0062220Z method(*args, **kwargs) 2025-12-04T10:13:48.0062720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0062821Z with policy(): 2025-12-04T10:13:48.0063322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0063433Z raise RuntimeError(msg) 2025-12-04T10:13:48.0064649Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T10:13:48.0064660Z 2025-12-04T10:13:48.0064871Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0065575Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0065583Z 2025-12-04T10:13:48.0065845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0066116Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0066270Z ====================== 1 failed, 32 deselected in 47.96s ======================= 2025-12-04T10:13:48.0066353Z Got exit code 1 2025-12-04T10:13:48.0066447Z Retrying single test... 2025-12-04T10:13:48.0066997Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4119ccbda03fb8bd.xml 2025-12-04T10:13:48.0067142Z ============================= test session starts ============================== 2025-12-04T10:13:48.0067451Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0067577Z cachedir: .pytest_cache 2025-12-04T10:13:48.0068033Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0068141Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0068229Z configfile: pytest.ini 2025-12-04T10:13:48.0068707Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0068894Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.0069582Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0069682Z Running 1 items in this shard 2025-12-04T10:13:48.0069718Z 2025-12-04T10:13:48.0070652Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda I1204 09:39:38.890000 57965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 58017 2025-12-04T10:13:48.0071099Z I1204 09:39:38.891000 57965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 58018 2025-12-04T10:13:48.0071535Z I1204 09:39:38.891000 57965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 58019 2025-12-04T10:13:48.0071967Z I1204 09:39:38.892000 57965 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 58020 2025-12-04T10:13:48.0073758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0073878Z _warn_cpu_init() 2025-12-04T10:13:48.0075669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0075752Z _warn_cpu_init() 2025-12-04T10:13:48.0077522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0077607Z _warn_cpu_init() 2025-12-04T10:13:48.0079785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0079881Z _warn_cpu_init() 2025-12-04T10:13:48.0080876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0080988Z return func(*args, **kwargs) 2025-12-04T10:13:48.0081518Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0082048Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0083041Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0083553Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0084573Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0084973Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0085932Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0086412Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0087371Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0087893Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0088847Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0089325Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0090286Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0090768Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0092427Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.0092749Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0093397Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0094682Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0095044Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0095791Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0096336Z [rank0]:E1204 09:40:27.609000 58017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0096786Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0097311Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0098301Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0098833Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0099812Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0100210Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0101160Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0101643Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0102642Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0103124Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0104107Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0104547Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0105500Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0106073Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0107557Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 602865664 and is now 649003008. 2025-12-04T10:13:48.0107876Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0108453Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0109463Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0109872Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0110516Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0110991Z [rank1]:E1204 09:40:27.610000 58018 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0111385Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0111857Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0112770Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0113224Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0114094Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0114455Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0115306Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0115763Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0116615Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0117070Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0117919Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0118305Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0119163Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0119594Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0121070Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.0121393Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0121971Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0123009Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0123331Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0123968Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0124442Z [rank2]:E1204 09:40:27.610000 58019 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0124839Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0125341Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0126223Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0126674Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0127542Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0127894Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0128770Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0129197Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0130074Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0130497Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0131347Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0131738Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0132588Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0133015Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0134844Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.0135212Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0135902Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0137053Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0137414Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0138424Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0139486Z [rank3]:E1204 09:40:27.611000 58020 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0139666Z dist init r=1, world=4 2025-12-04T10:13:48.0139832Z dist init r=0, world=4 2025-12-04T10:13:48.0139996Z dist init r=3, world=4 2025-12-04T10:13:48.0140157Z dist init r=2, world=4 2025-12-04T10:13:48.0142262Z [rank0]:[W1204 09:40:28.121158537 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0142444Z FAILED [50.2755s] [100%] 2025-12-04T10:13:48.0142454Z 2025-12-04T10:13:48.0142718Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0143282Z __ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda ___ 2025-12-04T10:13:48.0143483Z Traceback (most recent call last): 2025-12-04T10:13:48.0144504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0144690Z self._join_processes(fn) 2025-12-04T10:13:48.0145828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0146236Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0147235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0147431Z raise RuntimeError(error) 2025-12-04T10:13:48.0147956Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.0148156Z Traceback (most recent call last): 2025-12-04T10:13:48.0149112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0149281Z getattr(self, test_name)() 2025-12-04T10:13:48.0150286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0150434Z fn() 2025-12-04T10:13:48.0151296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0151485Z method(*args, **kwargs) 2025-12-04T10:13:48.0152398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0152577Z method(*args, **kwargs) 2025-12-04T10:13:48.0153454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0153623Z with policy(): 2025-12-04T10:13:48.0154566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0154758Z raise RuntimeError(msg) 2025-12-04T10:13:48.0157172Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.0157199Z 2025-12-04T10:13:48.0157588Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0158884Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0158895Z 2025-12-04T10:13:48.0159357Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0159367Z 2025-12-04T10:13:48.0159374Z 2025-12-04T10:13:48.0159741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0160206Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0161745Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4119ccbda03fb8bd.xml - 2025-12-04T10:13:48.0162029Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0163553Z FAILED [50.2755s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.0163747Z Traceback (most recent call last): 2025-12-04T10:13:48.0164827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0165006Z getattr(self, test_name)() 2025-12-04T10:13:48.0166044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0166286Z fn() 2025-12-04T10:13:48.0167135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0167301Z method(*args, **kwargs) 2025-12-04T10:13:48.0168164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0168375Z method(*args, **kwargs) 2025-12-04T10:13:48.0169222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0169380Z with policy(): 2025-12-04T10:13:48.0170190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0170381Z raise RuntimeError(msg) 2025-12-04T10:13:48.0172482Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.0172495Z 2025-12-04T10:13:48.0172839Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0174289Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0174307Z 2025-12-04T10:13:48.0174783Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0175052Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0175323Z ====================== 1 failed, 32 deselected in 50.49s ======================= 2025-12-04T10:13:48.0175433Z Got exit code 1 2025-12-04T10:13:48.0176045Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda 2025-12-04T10:13:48.0176454Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.0177164Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77506912df9607dd.xml 2025-12-04T10:13:48.0177327Z ============================= test session starts ============================== 2025-12-04T10:13:48.0177681Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0177787Z cachedir: .pytest_cache 2025-12-04T10:13:48.0178297Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0178426Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0178528Z configfile: pytest.ini 2025-12-04T10:13:48.0179281Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0179592Z collecting ... collected 60 items / 9 deselected / 51 selected 2025-12-04T10:13:48.0179728Z stepcurrent: skipping 9 already run items. 2025-12-04T10:13:48.0179841Z Running 24 items in this shard 2025-12-04T10:13:48.0179851Z 2025-12-04T10:13:48.0180946Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda I1204 09:40:33.940000 58302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 58354 2025-12-04T10:13:48.0181448Z I1204 09:40:33.940000 58302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 58355 2025-12-04T10:13:48.0181937Z I1204 09:40:33.941000 58302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 58356 2025-12-04T10:13:48.0182421Z I1204 09:40:33.942000 58302 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 58357 2025-12-04T10:13:48.0184484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0184621Z _warn_cpu_init() 2025-12-04T10:13:48.0186632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0186730Z _warn_cpu_init() 2025-12-04T10:13:48.0188737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0188831Z _warn_cpu_init() 2025-12-04T10:13:48.0191022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0191110Z _warn_cpu_init() 2025-12-04T10:13:48.0192019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0192123Z return func(*args, **kwargs) 2025-12-04T10:13:48.0192527Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0193002Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0193891Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0194366Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0195238Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0195587Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0196437Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0196861Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0197744Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0198171Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0199047Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0199441Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0200283Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0200720Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0202232Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T10:13:48.0202560Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0203137Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0204190Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0204535Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0205168Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0205653Z [rank1]:E1204 09:41:19.616000 58355 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0206053Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0206527Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0207431Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0207877Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0208750Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0209094Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0209951Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0210403Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0211246Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0211698Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0212540Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0212935Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0214053Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0214619Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0216337Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 714014720 and is now 758054912. 2025-12-04T10:13:48.0216704Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0217352Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0218598Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0218960Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0219665Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0220211Z [rank0]:E1204 09:41:19.617000 58354 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0220683Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0221216Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0222208Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0222709Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0223699Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0224088Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0225081Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0225561Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0226584Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0227011Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0227855Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0228250Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0229094Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0229528Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0231038Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.0231361Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0231967Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0233018Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0233342Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0233973Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0234492Z [rank3]:E1204 09:41:19.617000 58357 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0234890Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0235364Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0236241Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0236685Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0237558Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0237945Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0238799Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0239256Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0240109Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0240538Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0241383Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0241784Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0242628Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0243240Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0244885Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.0245232Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0245844Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0246953Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0247294Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0247989Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0248504Z [rank2]:E1204 09:41:19.617000 58356 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0248599Z dist init r=1, world=4 2025-12-04T10:13:48.0248687Z dist init r=3, world=4 2025-12-04T10:13:48.0248778Z dist init r=0, world=4 2025-12-04T10:13:48.0249862Z [rank0]:[W1204 09:41:20.114009724 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0249961Z dist init r=2, world=4 2025-12-04T10:13:48.0250049Z FAILED [47.2583s] [ 4%] 2025-12-04T10:13:48.0250054Z 2025-12-04T10:13:48.0250190Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0250542Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.0250651Z Traceback (most recent call last): 2025-12-04T10:13:48.0251165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0251297Z self._join_processes(fn) 2025-12-04T10:13:48.0251838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0251974Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0252543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0252644Z raise RuntimeError(error) 2025-12-04T10:13:48.0252868Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0252979Z Traceback (most recent call last): 2025-12-04T10:13:48.0253574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0253846Z getattr(self, test_name)() 2025-12-04T10:13:48.0254376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0254554Z fn() 2025-12-04T10:13:48.0255057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0255157Z method(*args, **kwargs) 2025-12-04T10:13:48.0255661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0255760Z method(*args, **kwargs) 2025-12-04T10:13:48.0256260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0256359Z with policy(): 2025-12-04T10:13:48.0256859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0257005Z raise RuntimeError(msg) 2025-12-04T10:13:48.0258258Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.0258265Z 2025-12-04T10:13:48.0258486Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0259219Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0259226Z 2025-12-04T10:13:48.0259515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0259529Z 2025-12-04T10:13:48.0259534Z 2025-12-04T10:13:48.0259751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0260013Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0260814Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77506912df9607dd.xml - 2025-12-04T10:13:48.0260978Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0261871Z FAILED [47.2583s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0261988Z Traceback (most recent call last): 2025-12-04T10:13:48.0262532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0262673Z getattr(self, test_name)() 2025-12-04T10:13:48.0263205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0263331Z fn() 2025-12-04T10:13:48.0263835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0263935Z method(*args, **kwargs) 2025-12-04T10:13:48.0264440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0264539Z method(*args, **kwargs) 2025-12-04T10:13:48.0265035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0265133Z with policy(): 2025-12-04T10:13:48.0265741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0265846Z raise RuntimeError(msg) 2025-12-04T10:13:48.0267084Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.0267092Z 2025-12-04T10:13:48.0267278Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0267933Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0267938Z 2025-12-04T10:13:48.0268167Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0268322Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0268478Z ======================= 1 failed, 9 deselected in 47.48s ======================= 2025-12-04T10:13:48.0268559Z Got exit code 1 2025-12-04T10:13:48.0268653Z Retrying single test... 2025-12-04T10:13:48.0269223Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-71967038f6397bcb.xml 2025-12-04T10:13:48.0269363Z ============================= test session starts ============================== 2025-12-04T10:13:48.0269669Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0269760Z cachedir: .pytest_cache 2025-12-04T10:13:48.0270214Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0270317Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0270411Z configfile: pytest.ini 2025-12-04T10:13:48.0270912Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0271100Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.0271820Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0271919Z Running 1 items in this shard 2025-12-04T10:13:48.0271924Z 2025-12-04T10:13:48.0272886Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda I1204 09:41:26.160000 58639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 58691 2025-12-04T10:13:48.0273332Z I1204 09:41:26.160000 58639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 58692 2025-12-04T10:13:48.0273768Z I1204 09:41:26.161000 58639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 58693 2025-12-04T10:13:48.0274235Z I1204 09:41:26.162000 58639 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 58694 2025-12-04T10:13:48.0276020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0276141Z _warn_cpu_init() 2025-12-04T10:13:48.0277916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0278004Z _warn_cpu_init() 2025-12-04T10:13:48.0280213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0280307Z _warn_cpu_init() 2025-12-04T10:13:48.0282382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0282481Z _warn_cpu_init() 2025-12-04T10:13:48.0283480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0283588Z return func(*args, **kwargs) 2025-12-04T10:13:48.0284047Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0284578Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0285612Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0286439Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0288096Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0288516Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0289481Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0290066Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0291135Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0291637Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0292542Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0292957Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0294141Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0294629Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0296347Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.0296711Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0297363Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0298592Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0298954Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0299669Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0300209Z [rank1]:E1204 09:42:14.159000 58692 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0300698Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0301230Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0302225Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0302738Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0303717Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0304120Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0305114Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0305703Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0306709Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0307135Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0307994Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0308388Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0309244Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0309676Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0311192Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.0311513Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0312115Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0313175Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0313488Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0314124Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0314630Z [rank0]:E1204 09:42:14.161000 58691 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0315034Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0315503Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0316380Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0316828Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0317697Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0318095Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0318942Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0319392Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0322567Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0322998Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0323851Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0324242Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0325088Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0325543Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0327107Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T10:13:48.0327428Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0328013Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0329059Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0329375Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0330041Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0330522Z [rank2]:E1204 09:42:14.162000 58693 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0330924Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0331392Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0332278Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0332725Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0333964Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0334364Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0335313Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0335863Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0336821Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0337314Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0338259Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0338698Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0339665Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0340151Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0341891Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.0342250Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0342913Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0344122Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0344484Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0345206Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0345855Z [rank3]:E1204 09:42:14.163000 58694 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0346063Z dist init r=0, world=4 2025-12-04T10:13:48.0346153Z dist init r=2, world=4 2025-12-04T10:13:48.0346237Z dist init r=1, world=4 2025-12-04T10:13:48.0346325Z dist init r=3, world=4 2025-12-04T10:13:48.0347346Z [rank0]:[W1204 09:42:14.670120371 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0347469Z FAILED [50.3941s] [100%] 2025-12-04T10:13:48.0347477Z 2025-12-04T10:13:48.0347607Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0347911Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.0348023Z Traceback (most recent call last): 2025-12-04T10:13:48.0348503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0348597Z self._join_processes(fn) 2025-12-04T10:13:48.0349524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0349651Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0350195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0350295Z raise RuntimeError(error) 2025-12-04T10:13:48.0350503Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.0350614Z Traceback (most recent call last): 2025-12-04T10:13:48.0351090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0351194Z getattr(self, test_name)() 2025-12-04T10:13:48.0351665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0351744Z fn() 2025-12-04T10:13:48.0352194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0352282Z method(*args, **kwargs) 2025-12-04T10:13:48.0352722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0352823Z method(*args, **kwargs) 2025-12-04T10:13:48.0353290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0353382Z with policy(): 2025-12-04T10:13:48.0353826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0353917Z raise RuntimeError(msg) 2025-12-04T10:13:48.0355045Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.0355053Z 2025-12-04T10:13:48.0355245Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0355929Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0355936Z 2025-12-04T10:13:48.0356170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0356175Z 2025-12-04T10:13:48.0356317Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0356427Z Traceback (most recent call last): 2025-12-04T10:13:48.0356910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0357008Z getattr(self, test_name)() 2025-12-04T10:13:48.0357479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0357555Z fn() 2025-12-04T10:13:48.0358003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0358121Z method(*args, **kwargs) 2025-12-04T10:13:48.0358559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0358655Z method(*args, **kwargs) 2025-12-04T10:13:48.0359093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0359181Z with policy(): 2025-12-04T10:13:48.0359625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0359718Z raise RuntimeError(msg) 2025-12-04T10:13:48.0360871Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.0360878Z 2025-12-04T10:13:48.0361065Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0361717Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0361722Z 2025-12-04T10:13:48.0361954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0361959Z 2025-12-04T10:13:48.0361963Z 2025-12-04T10:13:48.0362162Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0362391Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0363093Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-71967038f6397bcb.xml - 2025-12-04T10:13:48.0363245Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0364045Z FAILED [50.3941s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.0364194Z Traceback (most recent call last): 2025-12-04T10:13:48.0364676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0364770Z getattr(self, test_name)() 2025-12-04T10:13:48.0365246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0365325Z fn() 2025-12-04T10:13:48.0365771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0365867Z method(*args, **kwargs) 2025-12-04T10:13:48.0366335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0366430Z method(*args, **kwargs) 2025-12-04T10:13:48.0366869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0366953Z with policy(): 2025-12-04T10:13:48.0367399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0367490Z raise RuntimeError(msg) 2025-12-04T10:13:48.0368603Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.0368616Z 2025-12-04T10:13:48.0368801Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0369448Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0369479Z 2025-12-04T10:13:48.0369715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0369720Z 2025-12-04T10:13:48.0369857Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0369964Z Traceback (most recent call last): 2025-12-04T10:13:48.0370442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0370534Z getattr(self, test_name)() 2025-12-04T10:13:48.0371037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0371112Z fn() 2025-12-04T10:13:48.0371552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0371651Z method(*args, **kwargs) 2025-12-04T10:13:48.0372088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0372187Z method(*args, **kwargs) 2025-12-04T10:13:48.0372626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0372707Z with policy(): 2025-12-04T10:13:48.0373154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0373323Z raise RuntimeError(msg) 2025-12-04T10:13:48.0374720Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.0374729Z 2025-12-04T10:13:48.0374940Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0375707Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0375713Z 2025-12-04T10:13:48.0375979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0376152Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0376331Z ====================== 1 failed, 32 deselected in 50.61s ======================= 2025-12-04T10:13:48.0376424Z Got exit code 1 2025-12-04T10:13:48.0376529Z Retrying single test... 2025-12-04T10:13:48.0377158Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7aea602ded691711.xml 2025-12-04T10:13:48.0377314Z ============================= test session starts ============================== 2025-12-04T10:13:48.0377687Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0377794Z cachedir: .pytest_cache 2025-12-04T10:13:48.0378310Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0378435Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0378536Z configfile: pytest.ini 2025-12-04T10:13:48.0379289Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0379510Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.0380327Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0380441Z Running 1 items in this shard 2025-12-04T10:13:48.0380515Z 2025-12-04T10:13:48.0381607Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda I1204 09:42:20.999000 58976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 59028 2025-12-04T10:13:48.0382101Z I1204 09:42:21.000000 58976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 59029 2025-12-04T10:13:48.0382592Z I1204 09:42:21.001000 58976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 59030 2025-12-04T10:13:48.0383073Z I1204 09:42:21.002000 58976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 59031 2025-12-04T10:13:48.0385148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0385249Z _warn_cpu_init() 2025-12-04T10:13:48.0387255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0387356Z _warn_cpu_init() 2025-12-04T10:13:48.0389390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0389489Z _warn_cpu_init() 2025-12-04T10:13:48.0391486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0391571Z _warn_cpu_init() 2025-12-04T10:13:48.0392481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0392590Z return func(*args, **kwargs) 2025-12-04T10:13:48.0392996Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0393478Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0394356Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0394804Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0395680Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0396059Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0396914Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0397344Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0398222Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0398650Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0399492Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0399888Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0400736Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0401172Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0402706Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 711917568 and is now 758054912. 2025-12-04T10:13:48.0403032Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0403610Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0404655Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0405009Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0405642Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0406128Z [rank0]:E1204 09:43:16.532000 59028 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0406522Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0406991Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0407867Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0408337Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0409215Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0409561Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0410417Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0410888Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0411736Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0412164Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0413008Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0413468Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0414580Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0415076Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0416821Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.0417182Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0417839Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0419048Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0419414Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0420124Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0420668Z [rank1]:E1204 09:43:16.533000 59029 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0421118Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0421654Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0422678Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0423184Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0424174Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0424572Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0425675Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0426238Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0427085Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0427509Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0428356Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0428755Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0429603Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0430068Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0431588Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.0431913Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0432519Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0433573Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0433894Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0434522Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0435010Z [rank3]:E1204 09:43:16.533000 59031 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0435404Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0435904Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0436783Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0437227Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0438102Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0438483Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0439339Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0439767Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0440616Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0441044Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0441887Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0442284Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0443157Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0443599Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0445112Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 487522304 and is now 649003008. 2025-12-04T10:13:48.0445467Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0446172Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0448073Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0448665Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0449803Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0450800Z [rank2]:E1204 09:43:16.539000 59030 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0450957Z dist init r=3, world=4 2025-12-04T10:13:48.0451086Z dist init r=0, world=4 2025-12-04T10:13:48.0451227Z dist init r=1, world=4 2025-12-04T10:13:48.0451358Z dist init r=2, world=4 2025-12-04T10:13:48.0453135Z [rank0]:[W1204 09:43:16.046889244 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0453414Z FAILED [57.7611s] [100%] 2025-12-04T10:13:48.0453519Z 2025-12-04T10:13:48.0453908Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0454526Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.0454725Z Traceback (most recent call last): 2025-12-04T10:13:48.0455699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0455894Z self._join_processes(fn) 2025-12-04T10:13:48.0456972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0457243Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0458345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0458531Z raise RuntimeError(error) 2025-12-04T10:13:48.0458946Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.0459152Z Traceback (most recent call last): 2025-12-04T10:13:48.0460152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0460342Z getattr(self, test_name)() 2025-12-04T10:13:48.0461343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0461518Z fn() 2025-12-04T10:13:48.0462573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0462761Z method(*args, **kwargs) 2025-12-04T10:13:48.0463726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0463915Z method(*args, **kwargs) 2025-12-04T10:13:48.0464900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0465081Z with policy(): 2025-12-04T10:13:48.0466163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0466371Z raise RuntimeError(msg) 2025-12-04T10:13:48.0468957Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 711917568 and is now 758054912. 2025-12-04T10:13:48.0468972Z 2025-12-04T10:13:48.0469310Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0470481Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0470497Z 2025-12-04T10:13:48.0470917Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0470934Z 2025-12-04T10:13:48.0471175Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0471362Z Traceback (most recent call last): 2025-12-04T10:13:48.0472381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0472551Z getattr(self, test_name)() 2025-12-04T10:13:48.0473437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0473592Z fn() 2025-12-04T10:13:48.0474445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0474613Z method(*args, **kwargs) 2025-12-04T10:13:48.0475470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0475710Z method(*args, **kwargs) 2025-12-04T10:13:48.0476541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0476704Z with policy(): 2025-12-04T10:13:48.0477522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0477703Z raise RuntimeError(msg) 2025-12-04T10:13:48.0480337Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.0480351Z 2025-12-04T10:13:48.0480736Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0482111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0482128Z 2025-12-04T10:13:48.0482587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0482597Z 2025-12-04T10:13:48.0482607Z 2025-12-04T10:13:48.0482955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0483267Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0484186Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7aea602ded691711.xml - 2025-12-04T10:13:48.0484363Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0485267Z FAILED [57.7611s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.0485389Z Traceback (most recent call last): 2025-12-04T10:13:48.0485947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0486063Z getattr(self, test_name)() 2025-12-04T10:13:48.0486653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0486744Z fn() 2025-12-04T10:13:48.0487250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0487361Z method(*args, **kwargs) 2025-12-04T10:13:48.0487863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0487962Z method(*args, **kwargs) 2025-12-04T10:13:48.0488466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0488561Z with policy(): 2025-12-04T10:13:48.0489075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0489223Z raise RuntimeError(msg) 2025-12-04T10:13:48.0490498Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 711917568 and is now 758054912. 2025-12-04T10:13:48.0490512Z 2025-12-04T10:13:48.0490726Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0491662Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0491718Z 2025-12-04T10:13:48.0491972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0491978Z 2025-12-04T10:13:48.0492126Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0492245Z Traceback (most recent call last): 2025-12-04T10:13:48.0492758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0492861Z getattr(self, test_name)() 2025-12-04T10:13:48.0493481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0493562Z fn() 2025-12-04T10:13:48.0494224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0494335Z method(*args, **kwargs) 2025-12-04T10:13:48.0494829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0494943Z method(*args, **kwargs) 2025-12-04T10:13:48.0495444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0495536Z with policy(): 2025-12-04T10:13:48.0496048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0496155Z raise RuntimeError(msg) 2025-12-04T10:13:48.0497440Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.0497455Z 2025-12-04T10:13:48.0497667Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0498391Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0498399Z 2025-12-04T10:13:48.0498671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0498879Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0499065Z ====================== 1 failed, 32 deselected in 57.98s ======================= 2025-12-04T10:13:48.0499161Z Got exit code 1 2025-12-04T10:13:48.0499816Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.0500228Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.0500842Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-588db22c786ffc0c.xml 2025-12-04T10:13:48.0501007Z ============================= test session starts ============================== 2025-12-04T10:13:48.0501353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0501455Z cachedir: .pytest_cache 2025-12-04T10:13:48.0502007Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0502123Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0502224Z configfile: pytest.ini 2025-12-04T10:13:48.0502766Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0502977Z collecting ... collected 60 items / 10 deselected / 50 selected 2025-12-04T10:13:48.0503119Z stepcurrent: skipping 10 already run items. 2025-12-04T10:13:48.0503225Z Running 23 items in this shard 2025-12-04T10:13:48.0503232Z 2025-12-04T10:13:48.0504336Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda I1204 09:43:23.359000 59313 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 59365 2025-12-04T10:13:48.0504841Z I1204 09:43:23.360000 59313 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 59366 2025-12-04T10:13:48.0505335Z I1204 09:43:23.361000 59313 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 59367 2025-12-04T10:13:48.0506026Z I1204 09:43:23.362000 59313 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 59368 2025-12-04T10:13:48.0506910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0507023Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0508825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0508909Z _warn_cpu_init() 2025-12-04T10:13:48.0509820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0510014Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0510890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0511010Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0512800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0512894Z _warn_cpu_init() 2025-12-04T10:13:48.0513768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0513889Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0514759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0514907Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0516671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0516756Z _warn_cpu_init() 2025-12-04T10:13:48.0518539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0518656Z _warn_cpu_init() 2025-12-04T10:13:48.0519534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0519627Z return func(*args, **kwargs) 2025-12-04T10:13:48.0520505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0520699Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0521568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0521771Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0522666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0522860Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0523537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0523634Z return func(*args, **kwargs) 2025-12-04T10:13:48.0524314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0524408Z return func(*args, **kwargs) 2025-12-04T10:13:48.0525127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0525223Z return func(*args, **kwargs) 2025-12-04T10:13:48.0525892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0525989Z return func(*args, **kwargs) 2025-12-04T10:13:48.0526653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0526756Z return func(*args, **kwargs) 2025-12-04T10:13:48.0527420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0527545Z return func(*args, **kwargs) 2025-12-04T10:13:48.0528219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0528313Z return func(*args, **kwargs) 2025-12-04T10:13:48.0528989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0529083Z return func(*args, **kwargs) 2025-12-04T10:13:48.0529485Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0529998Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0530887Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0531345Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0532220Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0532569Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0533492Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0534134Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0535135Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0535618Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0536576Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0537017Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0538005Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0538502Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0540180Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T10:13:48.0540550Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0541201Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0542395Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0542757Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0543468Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0544020Z [rank0]:E1204 09:43:31.434000 59365 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0544497Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0545038Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0546222Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0546674Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0547542Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0548059Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0548966Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0549449Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0550353Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0550805Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0551708Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0552216Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0553124Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0553589Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0555162Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.0555515Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0556156Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0557248Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0557592Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0558475Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0559004Z [rank2]:E1204 09:43:31.434000 59367 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0559438Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0560059Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0560994Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0561468Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0562394Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0562769Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0563698Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0564151Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0565049Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0565503Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0566423Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0566844Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0567737Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0568201Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0569787Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.0570255Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0570839Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0571863Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0572211Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0572841Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0573421Z [rank3]:E1204 09:43:31.436000 59368 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0574017Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0574556Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0575546Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0576060Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0577043Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0577479Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0578435Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0579126Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0580096Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0580638Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0581599Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0582041Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0582993Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0583489Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0585205Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.0585573Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0586227Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0587422Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0587786Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0588498Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0589050Z [rank1]:E1204 09:43:31.439000 59366 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0589150Z dist init r=1, world=4 2025-12-04T10:13:48.0589255Z dist init r=2, world=4 2025-12-04T10:13:48.0589351Z dist init r=0, world=4 2025-12-04T10:13:48.0589445Z dist init r=3, world=4 2025-12-04T10:13:48.0590714Z [rank0]:[W1204 09:43:31.977924230 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0590805Z FAILED [10.5339s] [ 4%] 2025-12-04T10:13:48.0590813Z 2025-12-04T10:13:48.0590947Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0591261Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda _ 2025-12-04T10:13:48.0591369Z Traceback (most recent call last): 2025-12-04T10:13:48.0591858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0591956Z self._join_processes(fn) 2025-12-04T10:13:48.0592470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0592607Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0593137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0593241Z raise RuntimeError(error) 2025-12-04T10:13:48.0593476Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0593579Z Traceback (most recent call last): 2025-12-04T10:13:48.0594061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0594157Z getattr(self, test_name)() 2025-12-04T10:13:48.0594625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0594708Z fn() 2025-12-04T10:13:48.0595153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0595251Z method(*args, **kwargs) 2025-12-04T10:13:48.0595691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0595779Z method(*args, **kwargs) 2025-12-04T10:13:48.0596255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0596341Z with policy(): 2025-12-04T10:13:48.0596822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0596965Z raise RuntimeError(msg) 2025-12-04T10:13:48.0599014Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.0599110Z 2025-12-04T10:13:48.0599339Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0600007Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0600016Z 2025-12-04T10:13:48.0600272Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0600278Z 2025-12-04T10:13:48.0600431Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.0600543Z Traceback (most recent call last): 2025-12-04T10:13:48.0601069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0601173Z getattr(self, test_name)() 2025-12-04T10:13:48.0601684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0601768Z fn() 2025-12-04T10:13:48.0602239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0602344Z method(*args, **kwargs) 2025-12-04T10:13:48.0602815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0602910Z method(*args, **kwargs) 2025-12-04T10:13:48.0603417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0603505Z with policy(): 2025-12-04T10:13:48.0603987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0604086Z raise RuntimeError(msg) 2025-12-04T10:13:48.0605233Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.0605241Z 2025-12-04T10:13:48.0605447Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0606137Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0606144Z 2025-12-04T10:13:48.0606397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0606402Z 2025-12-04T10:13:48.0606406Z 2025-12-04T10:13:48.0606615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0606865Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0607612Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-588db22c786ffc0c.xml - 2025-12-04T10:13:48.0607770Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0608592Z FAILED [10.5339s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0608736Z Traceback (most recent call last): 2025-12-04T10:13:48.0609246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0609352Z getattr(self, test_name)() 2025-12-04T10:13:48.0609855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0609941Z fn() 2025-12-04T10:13:48.0610415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0610539Z method(*args, **kwargs) 2025-12-04T10:13:48.0611103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0611192Z method(*args, **kwargs) 2025-12-04T10:13:48.0611639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0611728Z with policy(): 2025-12-04T10:13:48.0612176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0612276Z raise RuntimeError(msg) 2025-12-04T10:13:48.0613457Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.0613466Z 2025-12-04T10:13:48.0613832Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0614529Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0614537Z 2025-12-04T10:13:48.0614799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0614804Z 2025-12-04T10:13:48.0614972Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.0615126Z Traceback (most recent call last): 2025-12-04T10:13:48.0615672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0615778Z getattr(self, test_name)() 2025-12-04T10:13:48.0616302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0616395Z fn() 2025-12-04T10:13:48.0616894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0616993Z method(*args, **kwargs) 2025-12-04T10:13:48.0617494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0617624Z method(*args, **kwargs) 2025-12-04T10:13:48.0618124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0618217Z with policy(): 2025-12-04T10:13:48.0618717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0618829Z raise RuntimeError(msg) 2025-12-04T10:13:48.0620048Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.0620056Z 2025-12-04T10:13:48.0620272Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0620969Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0621003Z 2025-12-04T10:13:48.0621264Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0621447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0621622Z ====================== 1 failed, 10 deselected in 10.75s ======================= 2025-12-04T10:13:48.0621719Z Got exit code 1 2025-12-04T10:13:48.0621818Z Retrying single test... 2025-12-04T10:13:48.0622443Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d2c93dc13a89050c.xml 2025-12-04T10:13:48.0622654Z ============================= test session starts ============================== 2025-12-04T10:13:48.0622996Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0623101Z cachedir: .pytest_cache 2025-12-04T10:13:48.0623629Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0623750Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0623858Z configfile: pytest.ini 2025-12-04T10:13:48.0624391Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0624604Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.0625392Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0625503Z Running 1 items in this shard 2025-12-04T10:13:48.0625509Z 2025-12-04T10:13:48.0626604Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda I1204 09:43:38.349000 59650 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 59702 2025-12-04T10:13:48.0627045Z I1204 09:43:38.350000 59650 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 59703 2025-12-04T10:13:48.0627505Z I1204 09:43:38.351000 59650 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 59704 2025-12-04T10:13:48.0627942Z I1204 09:43:38.352000 59650 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 59705 2025-12-04T10:13:48.0628827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0628954Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0630776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0630873Z _warn_cpu_init() 2025-12-04T10:13:48.0631753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0631947Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0632824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0632968Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0633839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0633953Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0635733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0635853Z _warn_cpu_init() 2025-12-04T10:13:48.0637618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0637709Z _warn_cpu_init() 2025-12-04T10:13:48.0638584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0638688Z return func(*args, **kwargs) 2025-12-04T10:13:48.0639561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0639692Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0641475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0641562Z _warn_cpu_init() 2025-12-04T10:13:48.0642442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0642635Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0643542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0643738Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0644610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0644805Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0645482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0645587Z return func(*args, **kwargs) 2025-12-04T10:13:48.0646262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0646387Z return func(*args, **kwargs) 2025-12-04T10:13:48.0647067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0647160Z return func(*args, **kwargs) 2025-12-04T10:13:48.0647834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0647925Z return func(*args, **kwargs) 2025-12-04T10:13:48.0648803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0648906Z return func(*args, **kwargs) 2025-12-04T10:13:48.0649610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0649713Z return func(*args, **kwargs) 2025-12-04T10:13:48.0650418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0650515Z return func(*args, **kwargs) 2025-12-04T10:13:48.0651230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0651328Z return func(*args, **kwargs) 2025-12-04T10:13:48.0651764Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0652263Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0653320Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0653981Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0654962Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0655366Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0656352Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0656845Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0657798Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0658275Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0659243Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0659684Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0660682Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0661173Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0662854Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 718209024 and is now 760152064. 2025-12-04T10:13:48.0663245Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0663903Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0665067Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0665428Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0666205Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0666690Z [rank0]:E1204 09:43:46.505000 59702 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0667095Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0667586Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0668471Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0668922Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0669791Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0670172Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0671020Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0671448Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0672290Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0672718Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0673566Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0673999Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0674853Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0675286Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0676815Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.0677138Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0677720Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0678914Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0679435Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0680153Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0680700Z [rank1]:E1204 09:43:46.506000 59703 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0681227Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0681756Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0682757Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0683278Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0684313Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0684717Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0685667Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0686158Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0687112Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0687598Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0688598Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0689037Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0689999Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0690528Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0692378Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.0692719Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0693393Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0694709Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0695069Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0695796Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0696370Z [rank2]:E1204 09:43:46.507000 59704 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0696824Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0697351Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0698345Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0698891Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0699875Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0700280Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0701233Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0701726Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0702679Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0703523Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0704488Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0704928Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0706117Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0706549Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0708042Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.0708359Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0708943Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0709973Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0710319Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0710958Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0711436Z [rank3]:E1204 09:43:46.507000 59705 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0711531Z dist init r=0, world=4 2025-12-04T10:13:48.0711618Z dist init r=1, world=4 2025-12-04T10:13:48.0711700Z dist init r=3, world=4 2025-12-04T10:13:48.0711786Z dist init r=2, world=4 2025-12-04T10:13:48.0712836Z [rank0]:[W1204 09:43:46.017182766 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0712924Z FAILED [10.5075s] [100%] 2025-12-04T10:13:48.0712930Z 2025-12-04T10:13:48.0713065Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0713342Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda _ 2025-12-04T10:13:48.0713453Z Traceback (most recent call last): 2025-12-04T10:13:48.0713935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0714030Z self._join_processes(fn) 2025-12-04T10:13:48.0714548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0714670Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0715204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0715333Z raise RuntimeError(error) 2025-12-04T10:13:48.0715713Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0715832Z Traceback (most recent call last): 2025-12-04T10:13:48.0716334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0716435Z getattr(self, test_name)() 2025-12-04T10:13:48.0716935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0717047Z fn() 2025-12-04T10:13:48.0717516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0717615Z method(*args, **kwargs) 2025-12-04T10:13:48.0718089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0718194Z method(*args, **kwargs) 2025-12-04T10:13:48.0718665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0718751Z with policy(): 2025-12-04T10:13:48.0719233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0719333Z raise RuntimeError(msg) 2025-12-04T10:13:48.0720495Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.0720504Z 2025-12-04T10:13:48.0720703Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0721363Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0721378Z 2025-12-04T10:13:48.0721650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0721655Z 2025-12-04T10:13:48.0721809Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0721931Z Traceback (most recent call last): 2025-12-04T10:13:48.0722444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0722548Z getattr(self, test_name)() 2025-12-04T10:13:48.0723058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0723142Z fn() 2025-12-04T10:13:48.0723624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0723762Z method(*args, **kwargs) 2025-12-04T10:13:48.0724233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0724338Z method(*args, **kwargs) 2025-12-04T10:13:48.0724808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0724899Z with policy(): 2025-12-04T10:13:48.0725374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0725475Z raise RuntimeError(msg) 2025-12-04T10:13:48.0726636Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.0726671Z 2025-12-04T10:13:48.0726872Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0727531Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0727545Z 2025-12-04T10:13:48.0727792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0727797Z 2025-12-04T10:13:48.0727801Z 2025-12-04T10:13:48.0728005Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0728256Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0729032Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d2c93dc13a89050c.xml - 2025-12-04T10:13:48.0729200Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0730020Z FAILED [10.5075s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0730133Z Traceback (most recent call last): 2025-12-04T10:13:48.0730650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0730753Z getattr(self, test_name)() 2025-12-04T10:13:48.0731259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0731341Z fn() 2025-12-04T10:13:48.0731815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0731916Z method(*args, **kwargs) 2025-12-04T10:13:48.0732388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0732483Z method(*args, **kwargs) 2025-12-04T10:13:48.0732959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0733081Z with policy(): 2025-12-04T10:13:48.0733802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0733913Z raise RuntimeError(msg) 2025-12-04T10:13:48.0735141Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.0735149Z 2025-12-04T10:13:48.0735371Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0736106Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0736114Z 2025-12-04T10:13:48.0736383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0736391Z 2025-12-04T10:13:48.0736550Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.0736666Z Traceback (most recent call last): 2025-12-04T10:13:48.0737219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0737324Z getattr(self, test_name)() 2025-12-04T10:13:48.0737864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0737950Z fn() 2025-12-04T10:13:48.0738448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0738592Z method(*args, **kwargs) 2025-12-04T10:13:48.0739090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0739191Z method(*args, **kwargs) 2025-12-04T10:13:48.0739700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0739796Z with policy(): 2025-12-04T10:13:48.0740306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0740410Z raise RuntimeError(msg) 2025-12-04T10:13:48.0741661Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.0741667Z 2025-12-04T10:13:48.0741890Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0742588Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0742594Z 2025-12-04T10:13:48.0742861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0743037Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0743212Z ====================== 1 failed, 32 deselected in 10.72s ======================= 2025-12-04T10:13:48.0743314Z Got exit code 1 2025-12-04T10:13:48.0743419Z Retrying single test... 2025-12-04T10:13:48.0744047Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e269a47641789945.xml 2025-12-04T10:13:48.0744203Z ============================= test session starts ============================== 2025-12-04T10:13:48.0744554Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0744662Z cachedir: .pytest_cache 2025-12-04T10:13:48.0745204Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0745325Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0745435Z configfile: pytest.ini 2025-12-04T10:13:48.0746050Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0746257Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.0746996Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0747097Z Running 1 items in this shard 2025-12-04T10:13:48.0747103Z 2025-12-04T10:13:48.0748142Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda I1204 09:43:53.330000 59987 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 60039 2025-12-04T10:13:48.0748608Z I1204 09:43:53.331000 59987 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 60040 2025-12-04T10:13:48.0749075Z I1204 09:43:53.331000 59987 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 60041 2025-12-04T10:13:48.0749534Z I1204 09:43:53.332000 59987 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 60042 2025-12-04T10:13:48.0750470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0750604Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0752520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0752622Z _warn_cpu_init() 2025-12-04T10:13:48.0753547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0753706Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0755600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0755697Z _warn_cpu_init() 2025-12-04T10:13:48.0756619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0756824Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0757759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0757962Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0758998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0759103Z return func(*args, **kwargs) 2025-12-04T10:13:48.0760027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0760162Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0761082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0761215Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.0763125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0763226Z _warn_cpu_init() 2025-12-04T10:13:48.0765098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0765222Z _warn_cpu_init() 2025-12-04T10:13:48.0766159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0766361Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0767296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.0767534Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.0768264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0768372Z return func(*args, **kwargs) 2025-12-04T10:13:48.0769086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0769194Z return func(*args, **kwargs) 2025-12-04T10:13:48.0769909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0770019Z return func(*args, **kwargs) 2025-12-04T10:13:48.0770730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.0770827Z return func(*args, **kwargs) 2025-12-04T10:13:48.0771546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0771644Z return func(*args, **kwargs) 2025-12-04T10:13:48.0772388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0772495Z return func(*args, **kwargs) 2025-12-04T10:13:48.0773273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0773382Z return func(*args, **kwargs) 2025-12-04T10:13:48.0774281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.0774389Z return func(*args, **kwargs) 2025-12-04T10:13:48.0774891Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0775425Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0776434Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0776939Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0777930Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0778321Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0779506Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0780012Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0780969Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0781531Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0782484Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0782938Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0783899Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0784383Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0786078Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T10:13:48.0786446Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0787144Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0788302Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0788666Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0789379Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0789963Z [rank0]:E1204 09:44:01.394000 60039 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0790422Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0791024Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0791914Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0792364Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0793241Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0793625Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0794473Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0794905Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0795752Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0796213Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0797064Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0797462Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0798310Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0798742Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0800237Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.0800586Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0801173Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0802192Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0802518Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0803176Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0803658Z [rank1]:E1204 09:44:01.395000 60040 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0804064Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0804530Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0805418Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0805864Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0806773Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0807122Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0807965Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0808426Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0809270Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0809707Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0810551Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0810945Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0811807Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0812240Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0814041Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.0814409Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0815072Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0816257Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0816630Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0817343Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0817884Z [rank3]:E1204 09:44:01.395000 60042 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0818338Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0818869Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0819879Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0820414Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0821402Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0821802Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0822807Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0823297Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0824258Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0824744Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0825700Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0826210Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0827064Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0827521Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0829017Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.0829340Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0829927Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0830976Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0831301Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0831932Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0832414Z [rank2]:E1204 09:44:01.396000 60041 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0832511Z dist init r=0, world=4 2025-12-04T10:13:48.0832596Z dist init r=1, world=4 2025-12-04T10:13:48.0832680Z dist init r=3, world=4 2025-12-04T10:13:48.0832771Z dist init r=2, world=4 2025-12-04T10:13:48.0833825Z [rank0]:[W1204 09:44:01.906865505 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0833920Z FAILED [10.2037s] [100%] 2025-12-04T10:13:48.0833926Z 2025-12-04T10:13:48.0834058Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0834338Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda _ 2025-12-04T10:13:48.0834450Z Traceback (most recent call last): 2025-12-04T10:13:48.0834928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0835062Z self._join_processes(fn) 2025-12-04T10:13:48.0835578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0835706Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0836249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0836348Z raise RuntimeError(error) 2025-12-04T10:13:48.0836554Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0836665Z Traceback (most recent call last): 2025-12-04T10:13:48.0837137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0837244Z getattr(self, test_name)() 2025-12-04T10:13:48.0837716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0837792Z fn() 2025-12-04T10:13:48.0838247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0838341Z method(*args, **kwargs) 2025-12-04T10:13:48.0838786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0838911Z method(*args, **kwargs) 2025-12-04T10:13:48.0839353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0839446Z with policy(): 2025-12-04T10:13:48.0839892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0839988Z raise RuntimeError(msg) 2025-12-04T10:13:48.0841081Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.0841089Z 2025-12-04T10:13:48.0841311Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0841945Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0841951Z 2025-12-04T10:13:48.0842184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0842189Z 2025-12-04T10:13:48.0842193Z 2025-12-04T10:13:48.0842394Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0842628Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0843333Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e269a47641789945.xml - 2025-12-04T10:13:48.0843487Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0844281Z FAILED [10.2037s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0844443Z Traceback (most recent call last): 2025-12-04T10:13:48.0845243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0845412Z getattr(self, test_name)() 2025-12-04T10:13:48.0846292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0846491Z fn() 2025-12-04T10:13:48.0847290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0847457Z method(*args, **kwargs) 2025-12-04T10:13:48.0848266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0848450Z method(*args, **kwargs) 2025-12-04T10:13:48.0849276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0849427Z with policy(): 2025-12-04T10:13:48.0850171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0850329Z raise RuntimeError(msg) 2025-12-04T10:13:48.0852272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.0852296Z 2025-12-04T10:13:48.0852628Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0853987Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0854006Z 2025-12-04T10:13:48.0854616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0854917Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0855232Z ====================== 1 failed, 32 deselected in 10.42s ======================= 2025-12-04T10:13:48.0855409Z Got exit code 1 2025-12-04T10:13:48.0856569Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda 2025-12-04T10:13:48.0857287Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.0858437Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d0e2108c889b6f40.xml 2025-12-04T10:13:48.0858817Z ============================= test session starts ============================== 2025-12-04T10:13:48.0859452Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0859654Z cachedir: .pytest_cache 2025-12-04T10:13:48.0860632Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0860854Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0861039Z configfile: pytest.ini 2025-12-04T10:13:48.0862056Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0862473Z collecting ... collected 60 items / 11 deselected / 49 selected 2025-12-04T10:13:48.0862740Z stepcurrent: skipping 11 already run items. 2025-12-04T10:13:48.0862947Z Running 22 items in this shard 2025-12-04T10:13:48.0862959Z 2025-12-04T10:13:48.0865099Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 09:44:08.340000 60324 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 60376 2025-12-04T10:13:48.0866104Z I1204 09:44:08.341000 60324 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 60377 2025-12-04T10:13:48.0867005Z I1204 09:44:08.341000 60324 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 60378 2025-12-04T10:13:48.0868066Z I1204 09:44:08.342000 60324 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 60379 2025-12-04T10:13:48.0871436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0871615Z _warn_cpu_init() 2025-12-04T10:13:48.0875009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0875180Z _warn_cpu_init() 2025-12-04T10:13:48.0878562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0879176Z _warn_cpu_init() 2025-12-04T10:13:48.0882300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0882417Z _warn_cpu_init() 2025-12-04T10:13:48.0883525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0883638Z return func(*args, **kwargs) 2025-12-04T10:13:48.0884111Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0884640Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0885642Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0886157Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0887144Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0887614Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0888568Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0889055Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0890058Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0890537Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0891703Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0892119Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0893024Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0893568Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0895450Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240. 2025-12-04T10:13:48.0895814Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0896468Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0897604Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0897964Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0898715Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0899258Z [rank0]:E1204 09:44:47.148000 60376 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0899707Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0900242Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0901239Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0901749Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0902762Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0903165Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0904115Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0904628Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0905686Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0906117Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0906966Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0907352Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0908207Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0908638Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0910151Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.0910471Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0911047Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0912084Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0912403Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0913042Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0913519Z [rank1]:E1204 09:44:47.148000 60377 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.0913913Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0914392Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0915274Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0915760Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0916634Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0916990Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0917863Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0918289Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0919146Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0919572Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0920421Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0920811Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0921666Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0922124Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0923777Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.0924124Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0924737Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0925841Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0926180Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0926855Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0927361Z [rank2]:E1204 09:44:47.149000 60378 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.0927778Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0928283Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0929245Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0929725Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0930644Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0931051Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0931949Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0932403Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0933379Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0934015Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0934984Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0935430Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0936444Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0936929Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0938594Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T10:13:48.0938964Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0939666Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0940807Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0941164Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0941883Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0942423Z [rank3]:E1204 09:44:47.149000 60379 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.0942552Z dist init r=1, world=4 2025-12-04T10:13:48.0942658Z dist init r=3, world=4 2025-12-04T10:13:48.0942751Z dist init r=2, world=4 2025-12-04T10:13:48.0942842Z dist init r=0, world=4 2025-12-04T10:13:48.0944010Z [rank0]:[W1204 09:44:47.668178157 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.0944109Z FAILED [40.3092s] [ 4%] 2025-12-04T10:13:48.0944117Z 2025-12-04T10:13:48.0944270Z =================================== FAILURES =================================== 2025-12-04T10:13:48.0944611Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___ 2025-12-04T10:13:48.0944731Z Traceback (most recent call last): 2025-12-04T10:13:48.0945281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.0945397Z self._join_processes(fn) 2025-12-04T10:13:48.0946037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.0946159Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.0946692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.0946797Z raise RuntimeError(error) 2025-12-04T10:13:48.0946998Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0947108Z Traceback (most recent call last): 2025-12-04T10:13:48.0947585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0947680Z getattr(self, test_name)() 2025-12-04T10:13:48.0948159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0948235Z fn() 2025-12-04T10:13:48.0948675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0948806Z method(*args, **kwargs) 2025-12-04T10:13:48.0949247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0949344Z method(*args, **kwargs) 2025-12-04T10:13:48.0949786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0949870Z with policy(): 2025-12-04T10:13:48.0950324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0950416Z raise RuntimeError(msg) 2025-12-04T10:13:48.0951519Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.0951535Z 2025-12-04T10:13:48.0951725Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0952329Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0952334Z 2025-12-04T10:13:48.0952572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0952576Z 2025-12-04T10:13:48.0952583Z 2025-12-04T10:13:48.0952774Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.0953014Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.0953721Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d0e2108c889b6f40.xml - 2025-12-04T10:13:48.0953896Z =========================== short test summary info ============================ 2025-12-04T10:13:48.0954657Z FAILED [40.3092s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.0954760Z Traceback (most recent call last): 2025-12-04T10:13:48.0955246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0955417Z getattr(self, test_name)() 2025-12-04T10:13:48.0956230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0956380Z fn() 2025-12-04T10:13:48.0957123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0957446Z method(*args, **kwargs) 2025-12-04T10:13:48.0958013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0958116Z method(*args, **kwargs) 2025-12-04T10:13:48.0958597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0958689Z with policy(): 2025-12-04T10:13:48.0959167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0959278Z raise RuntimeError(msg) 2025-12-04T10:13:48.0960428Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.0960440Z 2025-12-04T10:13:48.0960657Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0961374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0961381Z 2025-12-04T10:13:48.0961632Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0961806Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.0961975Z ====================== 1 failed, 11 deselected in 40.53s ======================= 2025-12-04T10:13:48.0962081Z Got exit code 1 2025-12-04T10:13:48.0962182Z Retrying single test... 2025-12-04T10:13:48.0962770Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d7cc16231ece4156.xml 2025-12-04T10:13:48.0962928Z ============================= test session starts ============================== 2025-12-04T10:13:48.0963295Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.0963405Z cachedir: .pytest_cache 2025-12-04T10:13:48.0963893Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.0964005Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.0964113Z configfile: pytest.ini 2025-12-04T10:13:48.0964616Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.0964822Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.0969322Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0969448Z Running 1 items in this shard 2025-12-04T10:13:48.0969522Z 2025-12-04T10:13:48.0970493Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 09:44:53.510000 60661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 60713 2025-12-04T10:13:48.0970934Z I1204 09:44:53.511000 60661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 60714 2025-12-04T10:13:48.0971364Z I1204 09:44:53.511000 60661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 60715 2025-12-04T10:13:48.0971797Z I1204 09:44:53.512000 60661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 60716 2025-12-04T10:13:48.0973899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0974013Z _warn_cpu_init() 2025-12-04T10:13:48.0976008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0976111Z _warn_cpu_init() 2025-12-04T10:13:48.0978142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0978243Z _warn_cpu_init() 2025-12-04T10:13:48.0979449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.0979560Z return func(*args, **kwargs) 2025-12-04T10:13:48.0981650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.0981746Z _warn_cpu_init() 2025-12-04T10:13:48.0982205Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0982734Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0983735Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0984242Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0985216Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0985657Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.0986607Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0987095Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0988155Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.0988647Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.0989600Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.0990040Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.0991070Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.0991499Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.0993033Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240. 2025-12-04T10:13:48.0993355Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0993937Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.0994942Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.0995284Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.0995925Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.0996401Z [rank0]:E1204 09:45:30.564000 60713 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.0996802Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.0997265Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.0998151Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.0998624Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.0999492Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.0999844Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1000685Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1001151Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1001998Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1002425Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1003276Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1003663Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1004515Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1004946Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1006445Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.1006761Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1007343Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1008363Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1008683Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1009316Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1009796Z [rank2]:E1204 09:45:30.567000 60715 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1010195Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1010657Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1011530Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1012007Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1012873Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1013284Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1014404Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1014893Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1015846Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1016325Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1017280Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1017722Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1018684Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1019197Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1020862Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1021225Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1021909Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1023039Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1023397Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1024110Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1024652Z [rank1]:E1204 09:45:30.567000 60714 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1025099Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1025655Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1026666Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1027117Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1027981Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1028360Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1029203Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1029637Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1030483Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1030904Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1031751Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1032142Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1033020Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1033451Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1034934Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T10:13:48.1035282Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1035869Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1036871Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1037188Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1037824Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1038302Z [rank3]:E1204 09:45:30.567000 60716 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1038425Z dist init r=1, world=4 2025-12-04T10:13:48.1038509Z dist init r=3, world=4 2025-12-04T10:13:48.1038591Z dist init r=0, world=4 2025-12-04T10:13:48.1038678Z dist init r=2, world=4 2025-12-04T10:13:48.1039698Z [rank0]:[W1204 09:45:30.078564659 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1039783Z FAILED [39.2455s] [100%] 2025-12-04T10:13:48.1039794Z 2025-12-04T10:13:48.1039918Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1040231Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___ 2025-12-04T10:13:48.1040339Z Traceback (most recent call last): 2025-12-04T10:13:48.1040823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1040923Z self._join_processes(fn) 2025-12-04T10:13:48.1041447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1041571Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1042105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1042201Z raise RuntimeError(error) 2025-12-04T10:13:48.1042402Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.1042511Z Traceback (most recent call last): 2025-12-04T10:13:48.1042983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1043077Z getattr(self, test_name)() 2025-12-04T10:13:48.1043554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1043630Z fn() 2025-12-04T10:13:48.1044107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1044195Z method(*args, **kwargs) 2025-12-04T10:13:48.1044638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1044734Z method(*args, **kwargs) 2025-12-04T10:13:48.1045174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1045257Z with policy(): 2025-12-04T10:13:48.1045707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1045800Z raise RuntimeError(msg) 2025-12-04T10:13:48.1046908Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1046914Z 2025-12-04T10:13:48.1047102Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1047709Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1047714Z 2025-12-04T10:13:48.1047944Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1047951Z 2025-12-04T10:13:48.1048092Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.1048200Z Traceback (most recent call last): 2025-12-04T10:13:48.1048677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1048799Z getattr(self, test_name)() 2025-12-04T10:13:48.1049273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1049351Z fn() 2025-12-04T10:13:48.1049800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1049888Z method(*args, **kwargs) 2025-12-04T10:13:48.1050326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1050423Z method(*args, **kwargs) 2025-12-04T10:13:48.1050887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1050973Z with policy(): 2025-12-04T10:13:48.1051419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1051513Z raise RuntimeError(msg) 2025-12-04T10:13:48.1052595Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T10:13:48.1052601Z 2025-12-04T10:13:48.1052786Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1053481Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1053490Z 2025-12-04T10:13:48.1053905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1053911Z 2025-12-04T10:13:48.1053916Z 2025-12-04T10:13:48.1054132Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1054400Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1055238Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d7cc16231ece4156.xml - 2025-12-04T10:13:48.1055413Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1056258Z FAILED [39.2455s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.1056371Z Traceback (most recent call last): 2025-12-04T10:13:48.1056924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1057032Z getattr(self, test_name)() 2025-12-04T10:13:48.1057598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1057684Z fn() 2025-12-04T10:13:48.1058188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1058298Z method(*args, **kwargs) 2025-12-04T10:13:48.1058796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1058894Z method(*args, **kwargs) 2025-12-04T10:13:48.1059397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1059491Z with policy(): 2025-12-04T10:13:48.1059996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1060098Z raise RuntimeError(msg) 2025-12-04T10:13:48.1061311Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1061353Z 2025-12-04T10:13:48.1061568Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1062249Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1062254Z 2025-12-04T10:13:48.1062515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1062549Z 2025-12-04T10:13:48.1062707Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.1062823Z Traceback (most recent call last): 2025-12-04T10:13:48.1063369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1063475Z getattr(self, test_name)() 2025-12-04T10:13:48.1064012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1064094Z fn() 2025-12-04T10:13:48.1064593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1064695Z method(*args, **kwargs) 2025-12-04T10:13:48.1065194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1065296Z method(*args, **kwargs) 2025-12-04T10:13:48.1065893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1065980Z with policy(): 2025-12-04T10:13:48.1066470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1066575Z raise RuntimeError(msg) 2025-12-04T10:13:48.1067773Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T10:13:48.1067786Z 2025-12-04T10:13:48.1067988Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1068649Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1068656Z 2025-12-04T10:13:48.1068909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1069076Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1069249Z ====================== 1 failed, 32 deselected in 39.46s ======================= 2025-12-04T10:13:48.1069371Z Got exit code 1 2025-12-04T10:13:48.1069469Z Retrying single test... 2025-12-04T10:13:48.1070077Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-01da52837cd28026.xml 2025-12-04T10:13:48.1070226Z ============================= test session starts ============================== 2025-12-04T10:13:48.1070553Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1070662Z cachedir: .pytest_cache 2025-12-04T10:13:48.1071153Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1071274Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1071371Z configfile: pytest.ini 2025-12-04T10:13:48.1071887Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1072127Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.1072858Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1072964Z Running 1 items in this shard 2025-12-04T10:13:48.1072969Z 2025-12-04T10:13:48.1073985Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 09:45:37.279000 60998 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 61050 2025-12-04T10:13:48.1074576Z I1204 09:45:37.280000 60998 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 61051 2025-12-04T10:13:48.1075151Z I1204 09:45:37.281000 60998 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 61052 2025-12-04T10:13:48.1075581Z I1204 09:45:37.282000 60998 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 61053 2025-12-04T10:13:48.1077374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1077458Z _warn_cpu_init() 2025-12-04T10:13:48.1079583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1079683Z _warn_cpu_init() 2025-12-04T10:13:48.1081727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1081828Z _warn_cpu_init() 2025-12-04T10:13:48.1083855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1083958Z _warn_cpu_init() 2025-12-04T10:13:48.1084945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1085057Z return func(*args, **kwargs) 2025-12-04T10:13:48.1085510Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1086041Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1087041Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1087593Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1088579Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1088969Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1089969Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1090455Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1091515Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1091946Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1092790Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1093226Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1094315Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1094847Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1097824Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T10:13:48.1098498Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1099730Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1101818Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1102482Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1103768Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1104724Z [rank0]:E1204 09:46:33.129000 61050 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1105535Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1106604Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1108457Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1109375Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1111160Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1111976Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1113797Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1114734Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1116500Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1117389Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1119095Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1119909Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1121804Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1122762Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1125578Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.1126217Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1127395Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1129327Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1129910Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1131104Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1131959Z [rank1]:E1204 09:46:33.129000 61051 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1132486Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1132983Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1134231Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1134734Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1135767Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1136160Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1137126Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1137610Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1138558Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1139048Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1139999Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1140485Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1141439Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1141928Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1143632Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1143995Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1144656Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1145889Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1146340Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1147094Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1147610Z [rank2]:E1204 09:46:33.130000 61052 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1148008Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1148474Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1149355Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1149827Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1150700Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1151054Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1151900Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1152326Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1153170Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1153778Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1154870Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1155303Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1156227Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1156705Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1158368Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.1158720Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1159360Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1160452Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1160806Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1161523Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1162055Z [rank3]:E1204 09:46:33.130000 61053 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1162149Z dist init r=1, world=4 2025-12-04T10:13:48.1162240Z dist init r=0, world=4 2025-12-04T10:13:48.1162336Z dist init r=2, world=4 2025-12-04T10:13:48.1162425Z dist init r=3, world=4 2025-12-04T10:13:48.1163549Z [rank0]:[W1204 09:46:33.644502469 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1163678Z FAILED [57.4395s] [100%] 2025-12-04T10:13:48.1163686Z 2025-12-04T10:13:48.1163829Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1164135Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___ 2025-12-04T10:13:48.1164250Z Traceback (most recent call last): 2025-12-04T10:13:48.1164776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1164886Z self._join_processes(fn) 2025-12-04T10:13:48.1165444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1165695Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1166258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1166362Z raise RuntimeError(error) 2025-12-04T10:13:48.1166584Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1166802Z Traceback (most recent call last): 2025-12-04T10:13:48.1167274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1167400Z getattr(self, test_name)() 2025-12-04T10:13:48.1167866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1167947Z fn() 2025-12-04T10:13:48.1168388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1168478Z method(*args, **kwargs) 2025-12-04T10:13:48.1168924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1169010Z method(*args, **kwargs) 2025-12-04T10:13:48.1169474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1169566Z with policy(): 2025-12-04T10:13:48.1170011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1170110Z raise RuntimeError(msg) 2025-12-04T10:13:48.1171187Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1171194Z 2025-12-04T10:13:48.1171380Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1171994Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1171999Z 2025-12-04T10:13:48.1172231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1172262Z 2025-12-04T10:13:48.1172267Z 2025-12-04T10:13:48.1172462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1172693Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1173465Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-01da52837cd28026.xml - 2025-12-04T10:13:48.1173780Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1174636Z FAILED [57.4395s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1174793Z Traceback (most recent call last): 2025-12-04T10:13:48.1175339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1175453Z getattr(self, test_name)() 2025-12-04T10:13:48.1175984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1176068Z fn() 2025-12-04T10:13:48.1176571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1176669Z method(*args, **kwargs) 2025-12-04T10:13:48.1177161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1177266Z method(*args, **kwargs) 2025-12-04T10:13:48.1177761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1177856Z with policy(): 2025-12-04T10:13:48.1178359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1178464Z raise RuntimeError(msg) 2025-12-04T10:13:48.1179966Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1179975Z 2025-12-04T10:13:48.1180186Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1180877Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1180886Z 2025-12-04T10:13:48.1181146Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1181318Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1181538Z ====================== 1 failed, 32 deselected in 57.66s ======================= 2025-12-04T10:13:48.1181636Z Got exit code 1 2025-12-04T10:13:48.1182246Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T10:13:48.1182647Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.1183260Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa4201c32172891c.xml 2025-12-04T10:13:48.1183422Z ============================= test session starts ============================== 2025-12-04T10:13:48.1183768Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1183875Z cachedir: .pytest_cache 2025-12-04T10:13:48.1184390Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1184550Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1184656Z configfile: pytest.ini 2025-12-04T10:13:48.1185186Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1185395Z collecting ... collected 60 items / 12 deselected / 48 selected 2025-12-04T10:13:48.1185536Z stepcurrent: skipping 12 already run items. 2025-12-04T10:13:48.1185643Z Running 21 items in this shard 2025-12-04T10:13:48.1185649Z 2025-12-04T10:13:48.1186737Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda I1204 09:46:39.439000 61335 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 61387 2025-12-04T10:13:48.1187277Z I1204 09:46:39.440000 61335 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 61388 2025-12-04T10:13:48.1187764Z I1204 09:46:39.441000 61335 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 61389 2025-12-04T10:13:48.1188260Z I1204 09:46:39.442000 61335 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 61390 2025-12-04T10:13:48.1190264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1190367Z _warn_cpu_init() 2025-12-04T10:13:48.1192482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1192583Z _warn_cpu_init() 2025-12-04T10:13:48.1194464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1194567Z _warn_cpu_init() 2025-12-04T10:13:48.1196460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1196558Z _warn_cpu_init() 2025-12-04T10:13:48.1197659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1197765Z return func(*args, **kwargs) 2025-12-04T10:13:48.1198212Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1198729Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1199734Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1200218Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1201170Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1201588Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1202511Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1202984Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1203910Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1204478Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1205445Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1205838Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1206736Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1207168Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1208678Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 714014720 and is now 734986240. 2025-12-04T10:13:48.1208998Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1209611Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1210749Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1211301Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1212380Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1213171Z [rank0]:E1204 09:47:35.231000 61387 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1214028Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1214555Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1215561Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1216063Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1217093Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1217491Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1218448Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1218932Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1219885Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1220371Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1221326Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1221797Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1222755Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1223239Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1224973Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1225336Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1226079Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1227118Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1227436Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1228070Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1228577Z [rank1]:E1204 09:47:35.231000 61388 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1228976Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1229439Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1230327Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1230823Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1231695Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1232049Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1232892Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1233324Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1234173Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1234604Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1235473Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1235861Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1236712Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1237141Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1238676Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.1238997Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1239582Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1240621Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1240941Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1241603Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1242081Z [rank2]:E1204 09:47:35.232000 61389 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1242479Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1242943Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1243850Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1244294Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1245160Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1245516Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1246360Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1246794Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1247636Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1248093Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1248933Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1249324Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1250182Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1250635Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1252154Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 246349824 and is now 625934336. 2025-12-04T10:13:48.1252472Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1253060Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1254408Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1254806Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1255529Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1256068Z [rank3]:E1204 09:47:35.232000 61390 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1256217Z dist init r=3, world=4 2025-12-04T10:13:48.1256311Z dist init r=1, world=4 2025-12-04T10:13:48.1256402Z dist init r=0, world=4 2025-12-04T10:13:48.1256504Z dist init r=2, world=4 2025-12-04T10:13:48.1257659Z [rank0]:[W1204 09:47:35.751419102 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1257765Z FAILED [57.2930s] [ 4%] 2025-12-04T10:13:48.1257772Z 2025-12-04T10:13:48.1257916Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1258258Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.1258385Z Traceback (most recent call last): 2025-12-04T10:13:48.1258921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1259032Z self._join_processes(fn) 2025-12-04T10:13:48.1259616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1259751Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1260364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1260475Z raise RuntimeError(error) 2025-12-04T10:13:48.1260735Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.1260858Z Traceback (most recent call last): 2025-12-04T10:13:48.1261389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1261495Z getattr(self, test_name)() 2025-12-04T10:13:48.1262028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1262117Z fn() 2025-12-04T10:13:48.1262623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1262723Z method(*args, **kwargs) 2025-12-04T10:13:48.1263247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1263354Z method(*args, **kwargs) 2025-12-04T10:13:48.1263855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1263953Z with policy(): 2025-12-04T10:13:48.1264454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1264556Z raise RuntimeError(msg) 2025-12-04T10:13:48.1265913Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 246349824 and is now 625934336. 2025-12-04T10:13:48.1265922Z 2025-12-04T10:13:48.1266218Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1266895Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1266900Z 2025-12-04T10:13:48.1267134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1267139Z 2025-12-04T10:13:48.1267143Z 2025-12-04T10:13:48.1267333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1267569Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1268271Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa4201c32172891c.xml - 2025-12-04T10:13:48.1268452Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1269239Z FAILED [57.2930s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.1269344Z Traceback (most recent call last): 2025-12-04T10:13:48.1269835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1269928Z getattr(self, test_name)() 2025-12-04T10:13:48.1270403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1270478Z fn() 2025-12-04T10:13:48.1270925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1271022Z method(*args, **kwargs) 2025-12-04T10:13:48.1271461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1271547Z method(*args, **kwargs) 2025-12-04T10:13:48.1271999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1272078Z with policy(): 2025-12-04T10:13:48.1272557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1272650Z raise RuntimeError(msg) 2025-12-04T10:13:48.1273767Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 246349824 and is now 625934336. 2025-12-04T10:13:48.1273781Z 2025-12-04T10:13:48.1273967Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1274611Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1274644Z 2025-12-04T10:13:48.1274878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1275032Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1275185Z ====================== 1 failed, 12 deselected in 57.51s ======================= 2025-12-04T10:13:48.1275270Z Got exit code 1 2025-12-04T10:13:48.1275358Z Retrying single test... 2025-12-04T10:13:48.1275914Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4da7b120579aed6b.xml 2025-12-04T10:13:48.1276050Z ============================= test session starts ============================== 2025-12-04T10:13:48.1276354Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1276448Z cachedir: .pytest_cache 2025-12-04T10:13:48.1276898Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1277033Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1277122Z configfile: pytest.ini 2025-12-04T10:13:48.1277593Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1277788Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.1278501Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1278779Z Running 1 items in this shard 2025-12-04T10:13:48.1278785Z 2025-12-04T10:13:48.1280026Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda I1204 09:47:41.720000 61672 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 61724 2025-12-04T10:13:48.1280525Z I1204 09:47:41.720000 61672 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 61725 2025-12-04T10:13:48.1281026Z I1204 09:47:41.721000 61672 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 61726 2025-12-04T10:13:48.1281512Z I1204 09:47:41.722000 61672 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 61727 2025-12-04T10:13:48.1283527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1283628Z _warn_cpu_init() 2025-12-04T10:13:48.1285695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1285791Z _warn_cpu_init() 2025-12-04T10:13:48.1287776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1287921Z _warn_cpu_init() 2025-12-04T10:13:48.1289910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1290010Z _warn_cpu_init() 2025-12-04T10:13:48.1290992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1291108Z return func(*args, **kwargs) 2025-12-04T10:13:48.1291646Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1292149Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1293036Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1293538Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1294734Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1295128Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1296090Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1296570Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1297521Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1298014Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1298964Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1299411Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1300394Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1300887Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1302626Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.1302997Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1303650Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1304834Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1305204Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1306023Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1306552Z [rank1]:E1204 09:48:23.936000 61725 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1306947Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1307412Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1308298Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1308767Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1309644Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1309996Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1310844Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1311268Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1312110Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1312542Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1313425Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1313820Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1314665Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1315100Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1316636Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.1316962Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1317540Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1318578Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1318900Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1319561Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1320049Z [rank0]:E1204 09:48:23.937000 61724 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1320444Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1320908Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1321819Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1322265Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1323139Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1323485Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1324330Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1324757Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1325601Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1326057Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1326899Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1327293Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1328142Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1328601Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1330114Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.1330433Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1331016Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1332054Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1332402Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1333031Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1333567Z [rank2]:E1204 09:48:23.938000 61726 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1334175Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1334743Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1335734Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1336243Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1337223Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1337612Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1338567Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1339050Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1340026Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1340510Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1341460Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1341904Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1342886Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1343381Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1345081Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1345443Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1346187Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1347252Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1347574Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1348202Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1348713Z [rank3]:E1204 09:48:23.938000 61727 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1348798Z dist init r=2, world=4 2025-12-04T10:13:48.1348883Z dist init r=0, world=4 2025-12-04T10:13:48.1348970Z dist init r=1, world=4 2025-12-04T10:13:48.1349054Z dist init r=3, world=4 2025-12-04T10:13:48.1350072Z [rank0]:[W1204 09:48:24.447825321 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1350164Z FAILED [44.4437s] [100%] 2025-12-04T10:13:48.1350169Z 2025-12-04T10:13:48.1350299Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1350604Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.1350708Z Traceback (most recent call last): 2025-12-04T10:13:48.1351187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1351288Z self._join_processes(fn) 2025-12-04T10:13:48.1351803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1351936Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1352498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1352595Z raise RuntimeError(error) 2025-12-04T10:13:48.1352805Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.1352907Z Traceback (most recent call last): 2025-12-04T10:13:48.1353379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1353479Z getattr(self, test_name)() 2025-12-04T10:13:48.1353942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1354021Z fn() 2025-12-04T10:13:48.1354509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1354600Z method(*args, **kwargs) 2025-12-04T10:13:48.1355049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1355138Z method(*args, **kwargs) 2025-12-04T10:13:48.1355578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1355666Z with policy(): 2025-12-04T10:13:48.1356113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1356217Z raise RuntimeError(msg) 2025-12-04T10:13:48.1357330Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.1357362Z 2025-12-04T10:13:48.1357547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1358197Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1358202Z 2025-12-04T10:13:48.1358437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1358442Z 2025-12-04T10:13:48.1358590Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.1358694Z Traceback (most recent call last): 2025-12-04T10:13:48.1359202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1359301Z getattr(self, test_name)() 2025-12-04T10:13:48.1359772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1359855Z fn() 2025-12-04T10:13:48.1360299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1360388Z method(*args, **kwargs) 2025-12-04T10:13:48.1360839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1360925Z method(*args, **kwargs) 2025-12-04T10:13:48.1361368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1361451Z with policy(): 2025-12-04T10:13:48.1361892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1361993Z raise RuntimeError(msg) 2025-12-04T10:13:48.1363104Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.1363137Z 2025-12-04T10:13:48.1363328Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1363971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1363976Z 2025-12-04T10:13:48.1364205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1364212Z 2025-12-04T10:13:48.1364359Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1364461Z Traceback (most recent call last): 2025-12-04T10:13:48.1364944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1365065Z getattr(self, test_name)() 2025-12-04T10:13:48.1365534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1365615Z fn() 2025-12-04T10:13:48.1366058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1366146Z method(*args, **kwargs) 2025-12-04T10:13:48.1366590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1366677Z method(*args, **kwargs) 2025-12-04T10:13:48.1367123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1367203Z with policy(): 2025-12-04T10:13:48.1367647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1367777Z raise RuntimeError(msg) 2025-12-04T10:13:48.1368886Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.1368891Z 2025-12-04T10:13:48.1369083Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1369722Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1369753Z 2025-12-04T10:13:48.1369984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1369994Z 2025-12-04T10:13:48.1369998Z 2025-12-04T10:13:48.1370190Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1370422Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1371131Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4da7b120579aed6b.xml - 2025-12-04T10:13:48.1371275Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1372063Z FAILED [44.4437s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.1372169Z Traceback (most recent call last): 2025-12-04T10:13:48.1372654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1372752Z getattr(self, test_name)() 2025-12-04T10:13:48.1373281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1373361Z fn() 2025-12-04T10:13:48.1373999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1374135Z method(*args, **kwargs) 2025-12-04T10:13:48.1374647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1374746Z method(*args, **kwargs) 2025-12-04T10:13:48.1375244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1375346Z with policy(): 2025-12-04T10:13:48.1375847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1375952Z raise RuntimeError(msg) 2025-12-04T10:13:48.1377225Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.1377236Z 2025-12-04T10:13:48.1377446Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1378179Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1378184Z 2025-12-04T10:13:48.1378444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1378452Z 2025-12-04T10:13:48.1378801Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.1378929Z Traceback (most recent call last): 2025-12-04T10:13:48.1379653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1379838Z getattr(self, test_name)() 2025-12-04T10:13:48.1380366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1380449Z fn() 2025-12-04T10:13:48.1380960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1381062Z method(*args, **kwargs) 2025-12-04T10:13:48.1381564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1381665Z method(*args, **kwargs) 2025-12-04T10:13:48.1382200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1382301Z with policy(): 2025-12-04T10:13:48.1382802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1382909Z raise RuntimeError(msg) 2025-12-04T10:13:48.1384169Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.1384176Z 2025-12-04T10:13:48.1384386Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1385114Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1385122Z 2025-12-04T10:13:48.1385380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1385386Z 2025-12-04T10:13:48.1385550Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1385664Z Traceback (most recent call last): 2025-12-04T10:13:48.1386206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1386322Z getattr(self, test_name)() 2025-12-04T10:13:48.1386885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1386978Z fn() 2025-12-04T10:13:48.1387476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1387578Z method(*args, **kwargs) 2025-12-04T10:13:48.1388081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1388180Z method(*args, **kwargs) 2025-12-04T10:13:48.1388675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1388778Z with policy(): 2025-12-04T10:13:48.1389315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1389429Z raise RuntimeError(msg) 2025-12-04T10:13:48.1390675Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.1390681Z 2025-12-04T10:13:48.1390890Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1391675Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1391680Z 2025-12-04T10:13:48.1391914Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1392109Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1392264Z ====================== 1 failed, 32 deselected in 44.66s ======================= 2025-12-04T10:13:48.1392346Z Got exit code 1 2025-12-04T10:13:48.1392443Z Retrying single test... 2025-12-04T10:13:48.1392998Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8c1fa5c204db7919.xml 2025-12-04T10:13:48.1393142Z ============================= test session starts ============================== 2025-12-04T10:13:48.1393446Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1393563Z cachedir: .pytest_cache 2025-12-04T10:13:48.1394023Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1394125Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1394216Z configfile: pytest.ini 2025-12-04T10:13:48.1394690Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1394878Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.1395597Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1395694Z Running 1 items in this shard 2025-12-04T10:13:48.1395699Z 2025-12-04T10:13:48.1396660Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda I1204 09:48:30.380000 62009 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 62061 2025-12-04T10:13:48.1397107Z I1204 09:48:30.381000 62009 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 62062 2025-12-04T10:13:48.1397543Z I1204 09:48:30.381000 62009 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 62063 2025-12-04T10:13:48.1398020Z I1204 09:48:30.382000 62009 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 62064 2025-12-04T10:13:48.1399812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1399902Z _warn_cpu_init() 2025-12-04T10:13:48.1401689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1401783Z _warn_cpu_init() 2025-12-04T10:13:48.1402658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1402751Z return func(*args, **kwargs) 2025-12-04T10:13:48.1404526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1404637Z _warn_cpu_init() 2025-12-04T10:13:48.1406406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1406513Z _warn_cpu_init() 2025-12-04T10:13:48.1406918Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1407389Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1408274Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1408724Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1409595Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1409950Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1410793Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1411598Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1412447Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1412874Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1413993Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1414542Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1415518Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1416004Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1417712Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240. 2025-12-04T10:13:48.1418073Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1418761Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1419957Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1420315Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1421038Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1421607Z [rank0]:E1204 09:49:11.660000 62061 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1422064Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1422590Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1423577Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1424086Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1425066Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1425464Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1426497Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1426929Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1427773Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1428201Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1429076Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1429476Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1430328Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1430756Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1432269Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.1432613Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1433197Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1434243Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1434598Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1435234Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1435718Z [rank2]:E1204 09:49:11.661000 62063 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1436118Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1436582Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1437462Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1437913Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1438782Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1439163Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1440007Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1440444Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1441287Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1441741Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1442591Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1442983Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1443839Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1444269Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1445780Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.1446125Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1446705Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1447787Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1448107Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1448745Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1449224Z [rank1]:E1204 09:49:11.661000 62062 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1449623Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1450087Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1450968Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1451426Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1452320Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1452674Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1453587Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1454240Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1455229Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1455713Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1456667Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1457108Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1458072Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1458591Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1460300Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T10:13:48.1460659Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1461339Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1462521Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1462882Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1463595Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1464133Z [rank3]:E1204 09:49:11.662000 62064 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1464240Z dist init r=1, world=4 2025-12-04T10:13:48.1464334Z dist init r=2, world=4 2025-12-04T10:13:48.1464424Z dist init r=3, world=4 2025-12-04T10:13:48.1464522Z dist init r=0, world=4 2025-12-04T10:13:48.1465673Z [rank0]:[W1204 09:49:12.172529411 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1465773Z FAILED [42.9154s] [100%] 2025-12-04T10:13:48.1465779Z 2025-12-04T10:13:48.1466057Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1466362Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.1466469Z Traceback (most recent call last): 2025-12-04T10:13:48.1466949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1467045Z self._join_processes(fn) 2025-12-04T10:13:48.1467563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1467685Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1468241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1468344Z raise RuntimeError(error) 2025-12-04T10:13:48.1468549Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.1468656Z Traceback (most recent call last): 2025-12-04T10:13:48.1469126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1469220Z getattr(self, test_name)() 2025-12-04T10:13:48.1469695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1469773Z fn() 2025-12-04T10:13:48.1470220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1470309Z method(*args, **kwargs) 2025-12-04T10:13:48.1470781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1470874Z method(*args, **kwargs) 2025-12-04T10:13:48.1471313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1471393Z with policy(): 2025-12-04T10:13:48.1471843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1471936Z raise RuntimeError(msg) 2025-12-04T10:13:48.1473048Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T10:13:48.1473080Z 2025-12-04T10:13:48.1473267Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1473916Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1473925Z 2025-12-04T10:13:48.1474158Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1474163Z 2025-12-04T10:13:48.1474167Z 2025-12-04T10:13:48.1474356Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1474594Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1475296Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8c1fa5c204db7919.xml - 2025-12-04T10:13:48.1475449Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1476236Z FAILED [42.9154s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.1476341Z Traceback (most recent call last): 2025-12-04T10:13:48.1476854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1476955Z getattr(self, test_name)() 2025-12-04T10:13:48.1477424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1477508Z fn() 2025-12-04T10:13:48.1477950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1478046Z method(*args, **kwargs) 2025-12-04T10:13:48.1478489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1478579Z method(*args, **kwargs) 2025-12-04T10:13:48.1479412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1479508Z with policy(): 2025-12-04T10:13:48.1480016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1480119Z raise RuntimeError(msg) 2025-12-04T10:13:48.1481365Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T10:13:48.1481373Z 2025-12-04T10:13:48.1481591Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1482315Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1482359Z 2025-12-04T10:13:48.1482625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1482800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1482973Z ====================== 1 failed, 32 deselected in 43.13s ======================= 2025-12-04T10:13:48.1483069Z Got exit code 1 2025-12-04T10:13:48.1483709Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.1484121Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.1484779Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-79d8c8140e8d4a45.xml 2025-12-04T10:13:48.1484934Z ============================= test session starts ============================== 2025-12-04T10:13:48.1485286Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1485390Z cachedir: .pytest_cache 2025-12-04T10:13:48.1485907Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1486030Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1486132Z configfile: pytest.ini 2025-12-04T10:13:48.1486665Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1486876Z collecting ... collected 60 items / 13 deselected / 47 selected 2025-12-04T10:13:48.1487012Z stepcurrent: skipping 13 already run items. 2025-12-04T10:13:48.1487125Z Running 20 items in this shard 2025-12-04T10:13:48.1487131Z 2025-12-04T10:13:48.1488180Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda I1204 09:49:18.099000 62346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 62398 2025-12-04T10:13:48.1488719Z I1204 09:49:18.100000 62346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 62399 2025-12-04T10:13:48.1489210Z I1204 09:49:18.101000 62346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 62400 2025-12-04T10:13:48.1489691Z I1204 09:49:18.102000 62346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 62401 2025-12-04T10:13:48.1491805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1491893Z _warn_cpu_init() 2025-12-04T10:13:48.1493913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1494009Z _warn_cpu_init() 2025-12-04T10:13:48.1495719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1495933Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1497635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1497796Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1499828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1499923Z _warn_cpu_init() 2025-12-04T10:13:48.1501929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1502030Z _warn_cpu_init() 2025-12-04T10:13:48.1503728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1503893Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1505622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1505789Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1506793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1507038Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1508547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1508688Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1509573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1509768Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1510640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1510880Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1511759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1511949Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1512819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1513061Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1514567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1514713Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1515582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1515774Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1516649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1516858Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1518390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1518531Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1519409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1519602Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1523639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1524018Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1524706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1524802Z return func(*args, **kwargs) 2025-12-04T10:13:48.1528763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1529136Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1529825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1529922Z return func(*args, **kwargs) 2025-12-04T10:13:48.1534202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1534596Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1535396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1535507Z return func(*args, **kwargs) 2025-12-04T10:13:48.1539985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1540406Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1541180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1541284Z return func(*args, **kwargs) 2025-12-04T10:13:48.1542035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1542176Z return func(*args, **kwargs) 2025-12-04T10:13:48.1542926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1543039Z return func(*args, **kwargs) 2025-12-04T10:13:48.1543792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1543897Z return func(*args, **kwargs) 2025-12-04T10:13:48.1544656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1544757Z return func(*args, **kwargs) 2025-12-04T10:13:48.1545753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1545962Z return func(*args, **kwargs) 2025-12-04T10:13:48.1546368Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1546874Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1547752Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1548202Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1549071Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1549460Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1550311Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1550739Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1551586Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1552013Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1552859Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1553279Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1554136Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1554564Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1556080Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 718209024 and is now 10524491776. 2025-12-04T10:13:48.1556410Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1556992Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1558007Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1558327Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1558963Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1559445Z [rank0]:E1204 09:49:27.599000 62398 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1559862Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1560334Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1561216Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1561667Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1562563Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1562918Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1563762Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1564186Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1565039Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1565465Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1566341Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1566729Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1567577Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1568032Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1569517Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 611254272 and is now 10413342720. 2025-12-04T10:13:48.1569842Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1570419Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1571430Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1571747Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1572387Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1572891Z [rank3]:E1204 09:49:27.601000 62401 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1573338Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1574004Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1574998Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1575540Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1576525Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1576919Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1577880Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1578364Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1579521Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1580067Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1581026Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1581463Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1582463Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1582947Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1584619Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1584985Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1585639Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1586783Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1587145Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1587902Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1588445Z [rank1]:E1204 09:49:27.601000 62399 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1588890Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1589428Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1590571Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1591158Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1592033Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1592387Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1593234Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1593692Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1594548Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1594973Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1595825Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1596243Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1597097Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1597533Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1599021Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1599345Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1599924Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1600978Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1601298Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1601940Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1602420Z [rank2]:E1204 09:49:27.601000 62400 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1602508Z dist init r=3, world=4 2025-12-04T10:13:48.1602599Z dist init r=0, world=4 2025-12-04T10:13:48.1602682Z dist init r=2, world=4 2025-12-04T10:13:48.1602766Z dist init r=1, world=4 2025-12-04T10:13:48.1603821Z [rank3]:[W1204 09:49:27.109939322 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1604835Z [rank0]:[W1204 09:49:28.111145253 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1605845Z [rank2]:[W1204 09:49:28.115206257 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1606848Z [rank1]:[W1204 09:49:28.119626173 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1606970Z FAILED [26.8637s] [ 5%] 2025-12-04T10:13:48.1606978Z 2025-12-04T10:13:48.1607103Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1607379Z __ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda ___ 2025-12-04T10:13:48.1607491Z Traceback (most recent call last): 2025-12-04T10:13:48.1607972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1608105Z self._join_processes(fn) 2025-12-04T10:13:48.1608617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1608739Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1609280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1609381Z raise RuntimeError(error) 2025-12-04T10:13:48.1609587Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1609699Z Traceback (most recent call last): 2025-12-04T10:13:48.1610171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1610271Z getattr(self, test_name)() 2025-12-04T10:13:48.1610738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1610814Z fn() 2025-12-04T10:13:48.1611268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1611358Z method(*args, **kwargs) 2025-12-04T10:13:48.1611812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1611900Z method(*args, **kwargs) 2025-12-04T10:13:48.1612363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1612456Z with policy(): 2025-12-04T10:13:48.1612905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1612997Z raise RuntimeError(msg) 2025-12-04T10:13:48.1614375Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1614385Z 2025-12-04T10:13:48.1614630Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1615334Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1615343Z 2025-12-04T10:13:48.1615603Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1615609Z 2025-12-04T10:13:48.1615613Z 2025-12-04T10:13:48.1615834Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1616090Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1616889Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-79d8c8140e8d4a45.xml - 2025-12-04T10:13:48.1617063Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1617911Z FAILED [26.8637s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1618066Z Traceback (most recent call last): 2025-12-04T10:13:48.1618612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1618721Z getattr(self, test_name)() 2025-12-04T10:13:48.1619259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1619345Z fn() 2025-12-04T10:13:48.1619844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1619980Z method(*args, **kwargs) 2025-12-04T10:13:48.1620475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1620584Z method(*args, **kwargs) 2025-12-04T10:13:48.1621082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1621173Z with policy(): 2025-12-04T10:13:48.1621684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1621789Z raise RuntimeError(msg) 2025-12-04T10:13:48.1623018Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1623026Z 2025-12-04T10:13:48.1623236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1623927Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1623934Z 2025-12-04T10:13:48.1624202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1624469Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1624650Z ====================== 1 failed, 13 deselected in 27.08s ======================= 2025-12-04T10:13:48.1624743Z Got exit code 1 2025-12-04T10:13:48.1624845Z Retrying single test... 2025-12-04T10:13:48.1625471Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8cddc87d4e2da4.xml 2025-12-04T10:13:48.1625737Z ============================= test session starts ============================== 2025-12-04T10:13:48.1626168Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1626268Z cachedir: .pytest_cache 2025-12-04T10:13:48.1626748Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1626863Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1626957Z configfile: pytest.ini 2025-12-04T10:13:48.1627426Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1627622Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.1628293Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1628402Z Running 1 items in this shard 2025-12-04T10:13:48.1628406Z 2025-12-04T10:13:48.1629327Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda I1204 09:49:49.609000 63451 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 63503 2025-12-04T10:13:48.1629793Z I1204 09:49:49.610000 63451 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 63504 2025-12-04T10:13:48.1630238Z I1204 09:49:49.611000 63451 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 63505 2025-12-04T10:13:48.1630666Z I1204 09:49:49.612000 63451 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 63506 2025-12-04T10:13:48.1632451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1632562Z _warn_cpu_init() 2025-12-04T10:13:48.1634340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1634424Z _warn_cpu_init() 2025-12-04T10:13:48.1636205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1636293Z _warn_cpu_init() 2025-12-04T10:13:48.1637820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1637981Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1639486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1639665Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1641173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1641322Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1643090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1643204Z _warn_cpu_init() 2025-12-04T10:13:48.1644703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1644854Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1645733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1645992Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1646878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1647090Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1647970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1648178Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1649701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1649847Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1651366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1651514Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1652398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1652598Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1653592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1653981Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1654963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1655178Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1656163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1656398Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1658142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1658301Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1659279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1659528Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1663990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1664380Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1665161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1665296Z return func(*args, **kwargs) 2025-12-04T10:13:48.1669681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1670029Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1670714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1670808Z return func(*args, **kwargs) 2025-12-04T10:13:48.1674759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1675151Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1675832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1675931Z return func(*args, **kwargs) 2025-12-04T10:13:48.1680356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1680749Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1681586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1681696Z return func(*args, **kwargs) 2025-12-04T10:13:48.1682448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1682558Z return func(*args, **kwargs) 2025-12-04T10:13:48.1683311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1683421Z return func(*args, **kwargs) 2025-12-04T10:13:48.1684206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1684311Z return func(*args, **kwargs) 2025-12-04T10:13:48.1685068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1685167Z return func(*args, **kwargs) 2025-12-04T10:13:48.1686158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1686271Z return func(*args, **kwargs) 2025-12-04T10:13:48.1686727Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1687305Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1688303Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1688809Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1689788Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1690220Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1691182Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1691741Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1692597Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1693022Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1694133Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1694641Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1695629Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1696120Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1697805Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 720306176 and is now 10524491776. 2025-12-04T10:13:48.1698204Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1698859Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1700004Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1700361Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1701074Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1701629Z [rank0]:E1204 09:49:59.144000 63503 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1702123Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1702655Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1703649Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1704162Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1705171Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1705675Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1706664Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1707090Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1707941Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1708369Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1709224Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1709643Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1710487Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1710927Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1712427Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 609157120 and is now 10413342720. 2025-12-04T10:13:48.1712756Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1713333Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1714348Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1714670Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1715303Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1715820Z [rank1]:E1204 09:49:59.145000 63504 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1716213Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1716687Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1717563Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1718230Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1719151Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1719523Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1720429Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1720880Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1721778Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1722233Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1723159Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1723574Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1724468Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1724940Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1726541Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1726885Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1727498Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1728567Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1728934Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1729775Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1730310Z [rank2]:E1204 09:49:59.146000 63505 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1730740Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1731254Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1732244Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1732742Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1733924Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1734317Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1735283Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1735765Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1736731Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1737244Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1738194Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1738644Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1739626Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1740120Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1741790Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 607059968 and is now 10413342720. 2025-12-04T10:13:48.1742154Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1742807Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1743949Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1744337Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1745050Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1745701Z [rank3]:E1204 09:49:59.147000 63506 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1745827Z dist init r=0, world=4 2025-12-04T10:13:48.1746031Z dist init r=1, world=4 2025-12-04T10:13:48.1746112Z dist init r=3, world=4 2025-12-04T10:13:48.1746194Z dist init r=2, world=4 2025-12-04T10:13:48.1747220Z [rank0]:[W1204 09:49:59.656875417 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1748224Z [rank1]:[W1204 09:49:59.659087673 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1749232Z [rank3]:[W1204 09:49:59.659326518 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1750234Z [rank2]:[W1204 09:49:59.662776896 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1750332Z FAILED [26.7258s] [100%] 2025-12-04T10:13:48.1750338Z 2025-12-04T10:13:48.1750489Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1750766Z __ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda ___ 2025-12-04T10:13:48.1750878Z Traceback (most recent call last): 2025-12-04T10:13:48.1751355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1751452Z self._join_processes(fn) 2025-12-04T10:13:48.1751974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1752095Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1752668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1752766Z raise RuntimeError(error) 2025-12-04T10:13:48.1752971Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1753082Z Traceback (most recent call last): 2025-12-04T10:13:48.1753551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1753649Z getattr(self, test_name)() 2025-12-04T10:13:48.1754123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1754201Z fn() 2025-12-04T10:13:48.1754652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1754743Z method(*args, **kwargs) 2025-12-04T10:13:48.1755184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1755309Z method(*args, **kwargs) 2025-12-04T10:13:48.1755749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1755834Z with policy(): 2025-12-04T10:13:48.1756285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1756377Z raise RuntimeError(msg) 2025-12-04T10:13:48.1757468Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1757505Z 2025-12-04T10:13:48.1757694Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1758312Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1758319Z 2025-12-04T10:13:48.1758549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1758555Z 2025-12-04T10:13:48.1758560Z 2025-12-04T10:13:48.1758751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1758985Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1759691Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8cddc87d4e2da4.xml - 2025-12-04T10:13:48.1759847Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1760602Z FAILED [26.7258s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1760711Z Traceback (most recent call last): 2025-12-04T10:13:48.1761207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1761329Z getattr(self, test_name)() 2025-12-04T10:13:48.1761808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1761882Z fn() 2025-12-04T10:13:48.1762328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1762426Z method(*args, **kwargs) 2025-12-04T10:13:48.1762867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1762958Z method(*args, **kwargs) 2025-12-04T10:13:48.1763436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1763522Z with policy(): 2025-12-04T10:13:48.1763975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1764068Z raise RuntimeError(msg) 2025-12-04T10:13:48.1765152Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1765157Z 2025-12-04T10:13:48.1765350Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1765957Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1765962Z 2025-12-04T10:13:48.1766201Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1766382Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1766541Z ====================== 1 failed, 32 deselected in 26.95s ======================= 2025-12-04T10:13:48.1766631Z Got exit code 1 2025-12-04T10:13:48.1766720Z Retrying single test... 2025-12-04T10:13:48.1767280Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-312ddbdab57572f7.xml 2025-12-04T10:13:48.1767419Z ============================= test session starts ============================== 2025-12-04T10:13:48.1767721Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1767845Z cachedir: .pytest_cache 2025-12-04T10:13:48.1768297Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1768404Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1768507Z configfile: pytest.ini 2025-12-04T10:13:48.1768976Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1769176Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.1769851Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1769945Z Running 1 items in this shard 2025-12-04T10:13:48.1769950Z 2025-12-04T10:13:48.1770882Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda I1204 09:50:21.190000 64556 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 64608 2025-12-04T10:13:48.1771320Z I1204 09:50:21.191000 64556 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 64609 2025-12-04T10:13:48.1771763Z I1204 09:50:21.191000 64556 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 64610 2025-12-04T10:13:48.1772217Z I1204 09:50:21.192000 64556 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 64611 2025-12-04T10:13:48.1774294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1774398Z _warn_cpu_init() 2025-12-04T10:13:48.1776424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1776531Z _warn_cpu_init() 2025-12-04T10:13:48.1778237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1778410Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1780323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1780554Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1782557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1782695Z _warn_cpu_init() 2025-12-04T10:13:48.1784700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1784805Z _warn_cpu_init() 2025-12-04T10:13:48.1786508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1786678Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1788407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1788567Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1789571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1789813Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1790920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1791134Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1792648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1792790Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1793667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1793869Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1794777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1794980Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1795847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1796059Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1797596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1797739Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1798620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1798824Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1800346Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1800493Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1801411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1801607Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1802477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1802670Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1806653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1807008Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1807692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1807821Z return func(*args, **kwargs) 2025-12-04T10:13:48.1811771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1812147Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1812828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1812928Z return func(*args, **kwargs) 2025-12-04T10:13:48.1817561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1817961Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1822449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.1822845Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.1823616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1823753Z return func(*args, **kwargs) 2025-12-04T10:13:48.1824872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1825082Z return func(*args, **kwargs) 2025-12-04T10:13:48.1826463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1826638Z return func(*args, **kwargs) 2025-12-04T10:13:48.1827834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1828110Z return func(*args, **kwargs) 2025-12-04T10:13:48.1829380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1829535Z return func(*args, **kwargs) 2025-12-04T10:13:48.1830672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1830841Z return func(*args, **kwargs) 2025-12-04T10:13:48.1832416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1832599Z return func(*args, **kwargs) 2025-12-04T10:13:48.1833292Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1834369Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1836463Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1837376Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1839120Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1839800Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1841650Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1842546Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1844402Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1845323Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1847047Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1847874Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1849650Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1850595Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1853783Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 720306176 and is now 10524491776. 2025-12-04T10:13:48.1854680Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1855919Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1858106Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1858814Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1860168Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1861153Z [rank0]:E1204 09:50:30.711000 64608 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1862028Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1862912Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1864019Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1864522Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1865512Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1866006Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1866896Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1867499Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1868408Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1868865Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1869757Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1870209Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1871108Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1871566Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1873141Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1873542Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1874155Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1875270Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1875594Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1876225Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1876713Z [rank2]:E1204 09:50:30.711000 64610 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.1877135Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1877605Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1878486Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1879293Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1880361Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1880758Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1881715Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1882198Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1883150Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1883635Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1884630Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1885077Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1886029Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1886570Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1888254Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 609157120 and is now 10413342720. 2025-12-04T10:13:48.1888625Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1889273Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1890403Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1890767Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1891668Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1892189Z [rank1]:E1204 09:50:30.712000 64609 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.1892584Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1893048Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1894219Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1894756Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1895746Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1896137Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1897103Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1897586Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1898536Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1899057Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1900011Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1900458Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1901442Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1901935Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1903604Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 607059968 and is now 10413342720. 2025-12-04T10:13:48.1903967Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1904621Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1905860Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1906303Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1906960Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1907444Z [rank3]:E1204 09:50:30.712000 64611 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.1907532Z dist init r=0, world=4 2025-12-04T10:13:48.1907615Z dist init r=1, world=4 2025-12-04T10:13:48.1907706Z dist init r=2, world=4 2025-12-04T10:13:48.1907791Z dist init r=3, world=4 2025-12-04T10:13:48.1908845Z [rank2]:[W1204 09:50:31.221666604 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1909860Z [rank1]:[W1204 09:50:31.222966359 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1910861Z [rank0]:[W1204 09:50:31.222977384 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1911861Z [rank3]:[W1204 09:50:31.232830833 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.1911949Z FAILED [26.7613s] [100%] 2025-12-04T10:13:48.1911983Z 2025-12-04T10:13:48.1912116Z =================================== FAILURES =================================== 2025-12-04T10:13:48.1912390Z __ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda ___ 2025-12-04T10:13:48.1912502Z Traceback (most recent call last): 2025-12-04T10:13:48.1912985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.1913080Z self._join_processes(fn) 2025-12-04T10:13:48.1913601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.1913753Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.1914283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.1914385Z raise RuntimeError(error) 2025-12-04T10:13:48.1914594Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1914702Z Traceback (most recent call last): 2025-12-04T10:13:48.1915178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1915273Z getattr(self, test_name)() 2025-12-04T10:13:48.1915746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1915822Z fn() 2025-12-04T10:13:48.1916264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1916360Z method(*args, **kwargs) 2025-12-04T10:13:48.1916800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1916892Z method(*args, **kwargs) 2025-12-04T10:13:48.1917336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1917420Z with policy(): 2025-12-04T10:13:48.1917899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1917992Z raise RuntimeError(msg) 2025-12-04T10:13:48.1919085Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1919093Z 2025-12-04T10:13:48.1919281Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1919892Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1919899Z 2025-12-04T10:13:48.1920161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1920166Z 2025-12-04T10:13:48.1920170Z 2025-12-04T10:13:48.1920365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.1920606Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.1921314Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-312ddbdab57572f7.xml - 2025-12-04T10:13:48.1921463Z =========================== short test summary info ============================ 2025-12-04T10:13:48.1922225Z FAILED [26.7613s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.1922329Z Traceback (most recent call last): 2025-12-04T10:13:48.1922817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1922953Z getattr(self, test_name)() 2025-12-04T10:13:48.1923427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1923510Z fn() 2025-12-04T10:13:48.1923954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1924050Z method(*args, **kwargs) 2025-12-04T10:13:48.1924492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1924609Z method(*args, **kwargs) 2025-12-04T10:13:48.1925055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1925142Z with policy(): 2025-12-04T10:13:48.1925592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1925690Z raise RuntimeError(msg) 2025-12-04T10:13:48.1926777Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.1926783Z 2025-12-04T10:13:48.1931379Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1932033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1932047Z 2025-12-04T10:13:48.1932285Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1932447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.1932607Z ====================== 1 failed, 32 deselected in 26.98s ======================= 2025-12-04T10:13:48.1932697Z Got exit code 1 2025-12-04T10:13:48.1933412Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda 2025-12-04T10:13:48.1933979Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.1934600Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb1563559edf316c.xml 2025-12-04T10:13:48.1934759Z ============================= test session starts ============================== 2025-12-04T10:13:48.1935110Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.1935215Z cachedir: .pytest_cache 2025-12-04T10:13:48.1935758Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.1935884Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.1935986Z configfile: pytest.ini 2025-12-04T10:13:48.1936512Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.1936732Z collecting ... collected 60 items / 14 deselected / 46 selected 2025-12-04T10:13:48.1936868Z stepcurrent: skipping 14 already run items. 2025-12-04T10:13:48.1936979Z Running 19 items in this shard 2025-12-04T10:13:48.1936986Z 2025-12-04T10:13:48.1938040Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda I1204 09:50:52.649000 65661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 65713 2025-12-04T10:13:48.1938534Z I1204 09:50:52.650000 65661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 65714 2025-12-04T10:13:48.1939067Z I1204 09:50:52.651000 65661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 65715 2025-12-04T10:13:48.1939550Z I1204 09:50:52.652000 65661 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 65716 2025-12-04T10:13:48.1941560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1941692Z _warn_cpu_init() 2025-12-04T10:13:48.1943707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1943802Z _warn_cpu_init() 2025-12-04T10:13:48.1945505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1945669Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1947605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1947696Z _warn_cpu_init() 2025-12-04T10:13:48.1949196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1949344Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1951129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.1951219Z _warn_cpu_init() 2025-12-04T10:13:48.1952724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1952872Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1954400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1954547Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1955422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1955661Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1957169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1957312Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1958196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1958387Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1959263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1959472Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1961007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1961155Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1962029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1962226Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1963115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1963324Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1964206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1964392Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1965262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1965472Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.1966985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.1967154Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.1968027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.1968126Z return func(*args, **kwargs) 2025-12-04T10:13:48.1969022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.1969216Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.1969898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1969992Z return func(*args, **kwargs) 2025-12-04T10:13:48.1970666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1970757Z return func(*args, **kwargs) 2025-12-04T10:13:48.1971429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1971522Z return func(*args, **kwargs) 2025-12-04T10:13:48.1972188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.1972289Z return func(*args, **kwargs) 2025-12-04T10:13:48.1972978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1973073Z return func(*args, **kwargs) 2025-12-04T10:13:48.1973995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1974097Z return func(*args, **kwargs) 2025-12-04T10:13:48.1974849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1974953Z return func(*args, **kwargs) 2025-12-04T10:13:48.1975755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.1975859Z return func(*args, **kwargs) 2025-12-04T10:13:48.1976320Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1976851Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1977840Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1978351Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1979538Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1980005Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1980966Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1981448Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1982447Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1982926Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1983886Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1984327Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1985289Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1985781Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.1987483Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T10:13:48.1987856Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1988509Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.1989652Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.1990011Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.1990866Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.1991346Z [rank0]:E1204 09:51:02.547000 65713 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.1991743Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.1992210Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.1993097Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.1993550Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.1994452Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.1994797Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.1995644Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1996098Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1996957Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.1997386Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.1998233Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.1998618Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.1999465Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.1999899Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2001402Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.2001725Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2002304Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2003340Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2003659Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2004296Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2004771Z [rank1]:E1204 09:51:02.549000 65714 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.2005164Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2005635Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2006513Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2007007Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2007871Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2008219Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2009098Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2009527Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2010374Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2010800Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2011646Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2012036Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2012887Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2013413Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2015221Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2015584Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2016234Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2017404Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2017761Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2018472Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2019015Z [rank2]:E1204 09:51:02.550000 65715 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.2019660Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2020644Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2022062Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2022580Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2023558Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2024009Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2024968Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2025452Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2026485Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2026935Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2027830Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2028245Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2029201Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2029661Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2031291Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.2031648Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2032224Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2033228Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2033545Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2034179Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2034662Z [rank3]:E1204 09:51:02.550000 65716 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.2034778Z dist init r=2, world=4 2025-12-04T10:13:48.2034865Z dist init r=0, world=4 2025-12-04T10:13:48.2034945Z dist init r=3, world=4 2025-12-04T10:13:48.2035027Z dist init r=1, world=4 2025-12-04T10:13:48.2036052Z [rank2]:[W1204 09:51:02.065507695 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2037055Z [rank0]:[W1204 09:51:02.065938335 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2038091Z [rank3]:[W1204 09:51:02.070501706 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2039097Z [rank1]:[W1204 09:51:02.106137806 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2039187Z FAILED [27.4246s] [ 5%] 2025-12-04T10:13:48.2039193Z 2025-12-04T10:13:48.2039318Z =================================== FAILURES =================================== 2025-12-04T10:13:48.2039588Z ___ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda ___ 2025-12-04T10:13:48.2039697Z Traceback (most recent call last): 2025-12-04T10:13:48.2040178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.2040277Z self._join_processes(fn) 2025-12-04T10:13:48.2040793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.2040914Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.2041475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.2041571Z raise RuntimeError(error) 2025-12-04T10:13:48.2041777Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.2041882Z Traceback (most recent call last): 2025-12-04T10:13:48.2042353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2042454Z getattr(self, test_name)() 2025-12-04T10:13:48.2042920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2042995Z fn() 2025-12-04T10:13:48.2043468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2043558Z method(*args, **kwargs) 2025-12-04T10:13:48.2044000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2044095Z method(*args, **kwargs) 2025-12-04T10:13:48.2044529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2044616Z with policy(): 2025-12-04T10:13:48.2045061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2045154Z raise RuntimeError(msg) 2025-12-04T10:13:48.2046238Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T10:13:48.2046271Z 2025-12-04T10:13:48.2046459Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2047067Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2047073Z 2025-12-04T10:13:48.2047305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2047309Z 2025-12-04T10:13:48.2047448Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.2047581Z Traceback (most recent call last): 2025-12-04T10:13:48.2048061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2048158Z getattr(self, test_name)() 2025-12-04T10:13:48.2048628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2048709Z fn() 2025-12-04T10:13:48.2049158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2049246Z method(*args, **kwargs) 2025-12-04T10:13:48.2049685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2049776Z method(*args, **kwargs) 2025-12-04T10:13:48.2050216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2050308Z with policy(): 2025-12-04T10:13:48.2050752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2050842Z raise RuntimeError(msg) 2025-12-04T10:13:48.2051919Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.2051952Z 2025-12-04T10:13:48.2052137Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2052744Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2052749Z 2025-12-04T10:13:48.2052976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2052983Z 2025-12-04T10:13:48.2052987Z 2025-12-04T10:13:48.2053181Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.2053497Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.2054480Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb1563559edf316c.xml - 2025-12-04T10:13:48.2054650Z =========================== short test summary info ============================ 2025-12-04T10:13:48.2055487Z FAILED [27.4246s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.2055608Z Traceback (most recent call last): 2025-12-04T10:13:48.2056148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2056259Z getattr(self, test_name)() 2025-12-04T10:13:48.2056790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2056872Z fn() 2025-12-04T10:13:48.2057376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2057511Z method(*args, **kwargs) 2025-12-04T10:13:48.2058009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2058112Z method(*args, **kwargs) 2025-12-04T10:13:48.2058605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2058695Z with policy(): 2025-12-04T10:13:48.2059200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2059332Z raise RuntimeError(msg) 2025-12-04T10:13:48.2060548Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T10:13:48.2060556Z 2025-12-04T10:13:48.2060764Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2061442Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2061448Z 2025-12-04T10:13:48.2061708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2061714Z 2025-12-04T10:13:48.2061869Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.2061991Z Traceback (most recent call last): 2025-12-04T10:13:48.2062529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2062631Z getattr(self, test_name)() 2025-12-04T10:13:48.2063168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2063256Z fn() 2025-12-04T10:13:48.2063754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2063885Z method(*args, **kwargs) 2025-12-04T10:13:48.2064382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2064487Z method(*args, **kwargs) 2025-12-04T10:13:48.2064983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2065074Z with policy(): 2025-12-04T10:13:48.2065693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2065792Z raise RuntimeError(msg) 2025-12-04T10:13:48.2067016Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.2067024Z 2025-12-04T10:13:48.2067209Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2067811Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2067820Z 2025-12-04T10:13:48.2068049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2068200Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.2068358Z ====================== 1 failed, 14 deselected in 27.64s ======================= 2025-12-04T10:13:48.2068438Z Got exit code 1 2025-12-04T10:13:48.2068528Z Retrying single test... 2025-12-04T10:13:48.2069082Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-63a7a52cd3aa8936.xml 2025-12-04T10:13:48.2069314Z ============================= test session starts ============================== 2025-12-04T10:13:48.2069622Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.2069713Z cachedir: .pytest_cache 2025-12-04T10:13:48.2070160Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.2070267Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.2070354Z configfile: pytest.ini 2025-12-04T10:13:48.2070860Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.2071052Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.2071730Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2071835Z Running 1 items in this shard 2025-12-04T10:13:48.2071840Z 2025-12-04T10:13:48.2072764Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda I1204 09:51:24.570000 66910 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 66962 2025-12-04T10:13:48.2073199Z I1204 09:51:24.571000 66910 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 66963 2025-12-04T10:13:48.2073635Z I1204 09:51:24.571000 66910 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 66964 2025-12-04T10:13:48.2074063Z I1204 09:51:24.572000 66910 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 66965 2025-12-04T10:13:48.2075886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2075973Z _warn_cpu_init() 2025-12-04T10:13:48.2077740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2077825Z _warn_cpu_init() 2025-12-04T10:13:48.2080079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2080178Z _warn_cpu_init() 2025-12-04T10:13:48.2081882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2082097Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2083801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2083970Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2085654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2085859Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2087855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2087955Z _warn_cpu_init() 2025-12-04T10:13:48.2089662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2089833Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2090860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2091207Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2092212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2092426Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2093361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2093766Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2095467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2095628Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2097328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2097525Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2098516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2098740Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2099725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2099996Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2100982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2101194Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2102191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.2102301Z return func(*args, **kwargs) 2025-12-04T10:13:48.2103283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2103519Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2105216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2105412Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2106547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2106743Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2107425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2107526Z return func(*args, **kwargs) 2025-12-04T10:13:48.2108233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2108326Z return func(*args, **kwargs) 2025-12-04T10:13:48.2109006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2109098Z return func(*args, **kwargs) 2025-12-04T10:13:48.2109766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2109863Z return func(*args, **kwargs) 2025-12-04T10:13:48.2110528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2110624Z return func(*args, **kwargs) 2025-12-04T10:13:48.2111313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2111402Z return func(*args, **kwargs) 2025-12-04T10:13:48.2112072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2112161Z return func(*args, **kwargs) 2025-12-04T10:13:48.2112829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2112943Z return func(*args, **kwargs) 2025-12-04T10:13:48.2113347Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2113822Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2114706Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2115154Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2116021Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2116375Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2117221Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2117671Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2118522Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2118950Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2119794Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2120206Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2121062Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2121491Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2122971Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 718209024 and is now 10516103168. 2025-12-04T10:13:48.2123296Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2124243Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2125257Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2125574Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2126207Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2126715Z [rank0]:E1204 09:51:34.389000 66962 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.2127114Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2127585Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2128462Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2128909Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2129778Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2130130Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2131001Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2131426Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2132275Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2132700Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2133817Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2134263Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2135225Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2135709Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2137385Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.2137779Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2138434Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2139573Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2139959Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2140670Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2141213Z [rank1]:E1204 09:51:34.390000 66963 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.2141661Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2142187Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2143180Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2143688Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2144670Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2145095Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2146240Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2146664Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2147514Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2147963Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2148812Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2149199Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2150044Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2150483Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2151971Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2152318Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2152895Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2154123Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2154487Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2155161Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2155671Z [rank3]:E1204 09:51:34.391000 66965 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.2156086Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2156586Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2157520Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2157996Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2158944Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2159310Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2160207Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2160658Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2161579Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2162033Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2163121Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2163545Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2164472Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2164947Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2166581Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2166936Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2167592Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2168692Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2169039Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2169734Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2170253Z [rank2]:E1204 09:51:34.391000 66964 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.2170346Z dist init r=1, world=4 2025-12-04T10:13:48.2170443Z dist init r=0, world=4 2025-12-04T10:13:48.2170534Z dist init r=2, world=4 2025-12-04T10:13:48.2170624Z dist init r=3, world=4 2025-12-04T10:13:48.2171744Z [rank0]:[W1204 09:51:34.902705414 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2172874Z [rank1]:[W1204 09:51:34.903650570 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2174230Z [rank3]:[W1204 09:51:34.906829242 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2175366Z [rank2]:[W1204 09:51:34.908051128 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2175472Z FAILED [27.6077s] [100%] 2025-12-04T10:13:48.2175530Z 2025-12-04T10:13:48.2175670Z =================================== FAILURES =================================== 2025-12-04T10:13:48.2175979Z ___ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda ___ 2025-12-04T10:13:48.2176100Z Traceback (most recent call last): 2025-12-04T10:13:48.2176648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.2176760Z self._join_processes(fn) 2025-12-04T10:13:48.2177342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.2177480Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.2178080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.2178217Z raise RuntimeError(error) 2025-12-04T10:13:48.2178448Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.2178570Z Traceback (most recent call last): 2025-12-04T10:13:48.2179316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2179428Z getattr(self, test_name)() 2025-12-04T10:13:48.2179953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2180037Z fn() 2025-12-04T10:13:48.2180540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2180705Z method(*args, **kwargs) 2025-12-04T10:13:48.2181205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2181309Z method(*args, **kwargs) 2025-12-04T10:13:48.2181808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2181907Z with policy(): 2025-12-04T10:13:48.2182412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2182515Z raise RuntimeError(msg) 2025-12-04T10:13:48.2183738Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 718209024 and is now 10516103168. 2025-12-04T10:13:48.2183748Z 2025-12-04T10:13:48.2183958Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2184648Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2184657Z 2025-12-04T10:13:48.2184918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2184924Z 2025-12-04T10:13:48.2185086Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.2185251Z Traceback (most recent call last): 2025-12-04T10:13:48.2185798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2185909Z getattr(self, test_name)() 2025-12-04T10:13:48.2186439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2186525Z fn() 2025-12-04T10:13:48.2187031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2187130Z method(*args, **kwargs) 2025-12-04T10:13:48.2187673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2187775Z method(*args, **kwargs) 2025-12-04T10:13:48.2188277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2188378Z with policy(): 2025-12-04T10:13:48.2188882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2188985Z raise RuntimeError(msg) 2025-12-04T10:13:48.2190206Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.2190215Z 2025-12-04T10:13:48.2190423Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2191193Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2191199Z 2025-12-04T10:13:48.2191432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2191437Z 2025-12-04T10:13:48.2191580Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.2191681Z Traceback (most recent call last): 2025-12-04T10:13:48.2192158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2192260Z getattr(self, test_name)() 2025-12-04T10:13:48.2192757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2192831Z fn() 2025-12-04T10:13:48.2193278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2193369Z method(*args, **kwargs) 2025-12-04T10:13:48.2193819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2193909Z method(*args, **kwargs) 2025-12-04T10:13:48.2194346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2194431Z with policy(): 2025-12-04T10:13:48.2195063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2195198Z raise RuntimeError(msg) 2025-12-04T10:13:48.2197229Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2197251Z 2025-12-04T10:13:48.2197605Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2198849Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2198862Z 2025-12-04T10:13:48.2199320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2199330Z 2025-12-04T10:13:48.2199338Z 2025-12-04T10:13:48.2199711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.2200149Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.2201427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-63a7a52cd3aa8936.xml - 2025-12-04T10:13:48.2201711Z =========================== short test summary info ============================ 2025-12-04T10:13:48.2203214Z FAILED [27.6077s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.2203428Z Traceback (most recent call last): 2025-12-04T10:13:48.2204326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2204498Z getattr(self, test_name)() 2025-12-04T10:13:48.2205401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2205716Z fn() 2025-12-04T10:13:48.2206632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2206819Z method(*args, **kwargs) 2025-12-04T10:13:48.2207693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2207960Z method(*args, **kwargs) 2025-12-04T10:13:48.2208865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2209027Z with policy(): 2025-12-04T10:13:48.2209910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2210093Z raise RuntimeError(msg) 2025-12-04T10:13:48.2212356Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 718209024 and is now 10516103168. 2025-12-04T10:13:48.2212453Z 2025-12-04T10:13:48.2212843Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2214482Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2214507Z 2025-12-04T10:13:48.2215010Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2215024Z 2025-12-04T10:13:48.2215320Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.2215550Z Traceback (most recent call last): 2025-12-04T10:13:48.2216557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2216744Z getattr(self, test_name)() 2025-12-04T10:13:48.2217731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2217890Z fn() 2025-12-04T10:13:48.2218833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2219010Z method(*args, **kwargs) 2025-12-04T10:13:48.2219907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2220097Z method(*args, **kwargs) 2025-12-04T10:13:48.2221109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2221271Z with policy(): 2025-12-04T10:13:48.2222226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2222412Z raise RuntimeError(msg) 2025-12-04T10:13:48.2224740Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.2224760Z 2025-12-04T10:13:48.2225340Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2226600Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2226615Z 2025-12-04T10:13:48.2227064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2227074Z 2025-12-04T10:13:48.2227322Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.2227510Z Traceback (most recent call last): 2025-12-04T10:13:48.2228417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2228602Z getattr(self, test_name)() 2025-12-04T10:13:48.2229534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2229662Z fn() 2025-12-04T10:13:48.2230474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2230715Z method(*args, **kwargs) 2025-12-04T10:13:48.2231550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2231721Z method(*args, **kwargs) 2025-12-04T10:13:48.2232466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2232612Z with policy(): 2025-12-04T10:13:48.2233257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2233432Z raise RuntimeError(msg) 2025-12-04T10:13:48.2234588Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2234599Z 2025-12-04T10:13:48.2234797Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2235443Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2235449Z 2025-12-04T10:13:48.2235692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2235854Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.2236022Z ====================== 1 failed, 32 deselected in 27.83s ======================= 2025-12-04T10:13:48.2236111Z Got exit code 1 2025-12-04T10:13:48.2236208Z Retrying single test... 2025-12-04T10:13:48.2236796Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a6324f00d63e140d.xml 2025-12-04T10:13:48.2236945Z ============================= test session starts ============================== 2025-12-04T10:13:48.2237273Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.2237421Z cachedir: .pytest_cache 2025-12-04T10:13:48.2237904Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.2238019Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.2238112Z configfile: pytest.ini 2025-12-04T10:13:48.2238615Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.2238815Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.2239532Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2239683Z Running 1 items in this shard 2025-12-04T10:13:48.2239689Z 2025-12-04T10:13:48.2240666Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda I1204 09:51:56.619000 68159 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 68211 2025-12-04T10:13:48.2241135Z I1204 09:51:56.620000 68159 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 68212 2025-12-04T10:13:48.2241593Z I1204 09:51:56.621000 68159 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 68213 2025-12-04T10:13:48.2242049Z I1204 09:51:56.622000 68159 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 68214 2025-12-04T10:13:48.2243939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2244063Z _warn_cpu_init() 2025-12-04T10:13:48.2245951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2246066Z _warn_cpu_init() 2025-12-04T10:13:48.2247948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2248036Z _warn_cpu_init() 2025-12-04T10:13:48.2249632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2249789Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2251503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2251649Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2253144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2253389Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2255559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2255660Z _warn_cpu_init() 2025-12-04T10:13:48.2257353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2257520Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2258506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2258783Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2259764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2259997Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2261695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2261888Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2262889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2263120Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2264829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2264989Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2266080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2266302Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2267178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2267370Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2268241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2268428Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2269330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.2269429Z return func(*args, **kwargs) 2025-12-04T10:13:48.2270300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2270507Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.2272012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2272179Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2273053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.2273245Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.2273921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2274048Z return func(*args, **kwargs) 2025-12-04T10:13:48.2274721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2274814Z return func(*args, **kwargs) 2025-12-04T10:13:48.2275485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2275577Z return func(*args, **kwargs) 2025-12-04T10:13:48.2276253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2276345Z return func(*args, **kwargs) 2025-12-04T10:13:48.2277006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2277103Z return func(*args, **kwargs) 2025-12-04T10:13:48.2277766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2277864Z return func(*args, **kwargs) 2025-12-04T10:13:48.2278553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2278799Z return func(*args, **kwargs) 2025-12-04T10:13:48.2279705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2279807Z return func(*args, **kwargs) 2025-12-04T10:13:48.2280273Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2280808Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2281867Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2282379Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2283359Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2283755Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2284705Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2285233Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2286189Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2286666Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2287618Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2288112Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2289075Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2289565Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2291232Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2291667Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2292244Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2293351Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2293674Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2294728Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2296135Z [rank1]:E1204 09:52:06.490000 68212 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.2297263Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2298474Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2300144Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2301795Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2303410Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2304974Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2306549Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2308108Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2309630Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2311135Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2312687Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2314168Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2315763Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2317236Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2319375Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168. 2025-12-04T10:13:48.2321395Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2322503Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2324300Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2325829Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2326961Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2328295Z [rank0]:E1204 09:52:06.492000 68211 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.2329351Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2330393Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2331939Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2334273Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2336335Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2337939Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2339433Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2340999Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2342571Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2344178Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2345853Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2347317Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2348678Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2350074Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2352108Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2354047Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2355068Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2356764Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2358204Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2359299Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2360531Z [rank3]:E1204 09:52:06.492000 68214 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.2361517Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2362500Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2363969Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2365421Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2366889Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2368223Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2369534Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2370923Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2372345Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2374274Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2375843Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2377374Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2379126Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2380716Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2383077Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.2385215Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2386359Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2388282Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2389942Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2391300Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2392526Z [rank2]:E1204 09:52:06.493000 68213 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.2393214Z dist init r=0, world=4 2025-12-04T10:13:48.2393454Z dist init r=3, world=4 2025-12-04T10:13:48.2393680Z dist init r=2, world=4 2025-12-04T10:13:48.2393914Z dist init r=1, world=4 2025-12-04T10:13:48.2395091Z [rank3]:[W1204 09:52:06.006045474 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2397285Z [rank0]:[W1204 09:52:06.006947976 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2399419Z [rank2]:[W1204 09:52:06.009880326 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2401539Z [rank1]:[W1204 09:52:06.023488742 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2402789Z FAILED [27.3604s] [100%] 2025-12-04T10:13:48.2402956Z 2025-12-04T10:13:48.2403087Z =================================== FAILURES =================================== 2025-12-04T10:13:48.2403616Z ___ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda ___ 2025-12-04T10:13:48.2404105Z Traceback (most recent call last): 2025-12-04T10:13:48.2404796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.2405487Z self._join_processes(fn) 2025-12-04T10:13:48.2406178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.2407118Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.2407929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.2408725Z raise RuntimeError(error) 2025-12-04T10:13:48.2409124Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.2409580Z Traceback (most recent call last): 2025-12-04T10:13:48.2410302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2411063Z getattr(self, test_name)() 2025-12-04T10:13:48.2411743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2412450Z fn() 2025-12-04T10:13:48.2413037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2413968Z method(*args, **kwargs) 2025-12-04T10:13:48.2414663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2415393Z method(*args, **kwargs) 2025-12-04T10:13:48.2416078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2416837Z with policy(): 2025-12-04T10:13:48.2417499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2418243Z raise RuntimeError(msg) 2025-12-04T10:13:48.2419645Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.2420977Z 2025-12-04T10:13:48.2421187Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2422218Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2423039Z 2025-12-04T10:13:48.2423300Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2423731Z 2025-12-04T10:13:48.2423736Z 2025-12-04T10:13:48.2423956Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.2424561Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.2425742Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a6324f00d63e140d.xml - 2025-12-04T10:13:48.2426836Z =========================== short test summary info ============================ 2025-12-04T10:13:48.2427852Z FAILED [27.3604s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.2428831Z Traceback (most recent call last): 2025-12-04T10:13:48.2429515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2430206Z getattr(self, test_name)() 2025-12-04T10:13:48.2430859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2431516Z fn() 2025-12-04T10:13:48.2432070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2432722Z method(*args, **kwargs) 2025-12-04T10:13:48.2433325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2433981Z method(*args, **kwargs) 2025-12-04T10:13:48.2434594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2435242Z with policy(): 2025-12-04T10:13:48.2435819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2436482Z raise RuntimeError(msg) 2025-12-04T10:13:48.2437756Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.2438938Z 2025-12-04T10:13:48.2439133Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2440027Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2440750Z 2025-12-04T10:13:48.2440981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2441486Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.2441941Z ====================== 1 failed, 32 deselected in 27.58s ======================= 2025-12-04T10:13:48.2442296Z Got exit code 1 2025-12-04T10:13:48.2442966Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda 2025-12-04T10:13:48.2443977Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.2444999Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cab4f0cffa47b1f.xml 2025-12-04T10:13:48.2445801Z ============================= test session starts ============================== 2025-12-04T10:13:48.2446369Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.2446883Z cachedir: .pytest_cache 2025-12-04T10:13:48.2447485Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.2448189Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.2448485Z configfile: pytest.ini 2025-12-04T10:13:48.2449109Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.2449875Z collecting ... collected 60 items / 15 deselected / 45 selected 2025-12-04T10:13:48.2450300Z stepcurrent: skipping 15 already run items. 2025-12-04T10:13:48.2450626Z Running 18 items in this shard 2025-12-04T10:13:48.2450805Z 2025-12-04T10:13:48.2451719Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda I1204 09:52:28.659000 69408 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 69460 2025-12-04T10:13:48.2453260Z I1204 09:52:28.660000 69408 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 69461 2025-12-04T10:13:48.2454505Z I1204 09:52:28.661000 69408 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 69462 2025-12-04T10:13:48.2455613Z I1204 09:52:28.662000 69408 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 69463 2025-12-04T10:13:48.2458239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2460452Z _warn_cpu_init() 2025-12-04T10:13:48.2462624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2464837Z _warn_cpu_init() 2025-12-04T10:13:48.2466952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2468905Z _warn_cpu_init() 2025-12-04T10:13:48.2469970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2471067Z _init_core_state( 2025-12-04T10:13:48.2472101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2473192Z _init_core_state( 2025-12-04T10:13:48.2474220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2475309Z _init_core_state( 2025-12-04T10:13:48.2476948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2478876Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2481028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2483565Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2487256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2490860Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2495560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2499688Z _warn_cpu_init() 2025-12-04T10:13:48.2501950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2504431Z _init_core_state( 2025-12-04T10:13:48.2508257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2511565Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2514919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2518490Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2521277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2523079Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2524210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.2525511Z return func(*args, **kwargs) 2025-12-04T10:13:48.2527305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2529282Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2530280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2531224Z return func(*args, **kwargs) 2025-12-04T10:13:48.2532139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2533117Z return func(*args, **kwargs) 2025-12-04T10:13:48.2534315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2535315Z return func(*args, **kwargs) 2025-12-04T10:13:48.2536277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2537272Z return func(*args, **kwargs) 2025-12-04T10:13:48.2538218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2539204Z return func(*args, **kwargs) 2025-12-04T10:13:48.2540157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2541139Z return func(*args, **kwargs) 2025-12-04T10:13:48.2542087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2543079Z return func(*args, **kwargs) 2025-12-04T10:13:48.2544074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2545068Z return func(*args, **kwargs) 2025-12-04T10:13:48.2545822Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2546933Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2548438Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2549889Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2551323Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2552661Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2553970Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2555364Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2556794Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2558180Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2559567Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2560945Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2562303Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2563709Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2565721Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2567614Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2568644Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2570336Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2571797Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2572863Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2574393Z [rank3]:E1204 09:52:38.794000 69463 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.2575520Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2576632Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2578317Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2580158Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2581780Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2583291Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2584780Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2586405Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2588967Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2590668Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2592154Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2593684Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2595125Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2596622Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2598843Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2600745Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2601768Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2603495Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2604920Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2605994Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2607233Z [rank1]:E1204 09:52:38.795000 69461 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.2608274Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2609255Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2610721Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2612163Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2613869Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2615387Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2616914Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2618489Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2620063Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2621664Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2623229Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2624757Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2626348Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2627748Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2629760Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2631650Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2632707Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2634398Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2635829Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2636904Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2638155Z [rank2]:E1204 09:52:38.795000 69462 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.2639153Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2640136Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2641599Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2643040Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2644477Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2645843Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2647162Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2648559Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2649946Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2651363Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2652754Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2654435Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2655969Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2657545Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2659816Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T10:13:48.2661980Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2663128Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2665032Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2666792Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2667949Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2669260Z [rank0]:E1204 09:52:38.799000 69460 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.2669991Z dist init r=3, world=4 2025-12-04T10:13:48.2670239Z dist init r=1, world=4 2025-12-04T10:13:48.2670483Z dist init r=2, world=4 2025-12-04T10:13:48.2670730Z dist init r=0, world=4 2025-12-04T10:13:48.2671965Z [rank3]:[W1204 09:52:39.309227514 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2674226Z [rank1]:[W1204 09:52:39.310483328 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2676521Z [rank2]:[W1204 09:52:39.311911216 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2678800Z [rank0]:[W1204 09:52:39.320626950 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2680359Z FAILED [27.7579s] [ 5%] 2025-12-04T10:13:48.2680542Z 2025-12-04T10:13:48.2680694Z =================================== FAILURES =================================== 2025-12-04T10:13:48.2681274Z _____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda _____ 2025-12-04T10:13:48.2681833Z Traceback (most recent call last): 2025-12-04T10:13:48.2682601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.2683386Z self._join_processes(fn) 2025-12-04T10:13:48.2684161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.2685012Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.2685869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.2686706Z raise RuntimeError(error) 2025-12-04T10:13:48.2687139Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.2687614Z Traceback (most recent call last): 2025-12-04T10:13:48.2688374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2689158Z getattr(self, test_name)() 2025-12-04T10:13:48.2689893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2690700Z fn() 2025-12-04T10:13:48.2691419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2692185Z method(*args, **kwargs) 2025-12-04T10:13:48.2692796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2693507Z method(*args, **kwargs) 2025-12-04T10:13:48.2694350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2695082Z with policy(): 2025-12-04T10:13:48.2695800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2696544Z raise RuntimeError(msg) 2025-12-04T10:13:48.2697941Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2699275Z 2025-12-04T10:13:48.2699489Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2700497Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2701290Z 2025-12-04T10:13:48.2701560Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2701955Z 2025-12-04T10:13:48.2701960Z 2025-12-04T10:13:48.2702178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.2702830Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.2704019Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cab4f0cffa47b1f.xml - 2025-12-04T10:13:48.2705118Z =========================== short test summary info ============================ 2025-12-04T10:13:48.2706307Z FAILED [27.7579s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.2707250Z Traceback (most recent call last): 2025-12-04T10:13:48.2707974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2708668Z getattr(self, test_name)() 2025-12-04T10:13:48.2709315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2709986Z fn() 2025-12-04T10:13:48.2710543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2711196Z method(*args, **kwargs) 2025-12-04T10:13:48.2711806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2712458Z method(*args, **kwargs) 2025-12-04T10:13:48.2713067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2713707Z with policy(): 2025-12-04T10:13:48.2714294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2714952Z raise RuntimeError(msg) 2025-12-04T10:13:48.2716178Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2717354Z 2025-12-04T10:13:48.2717568Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2718660Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2719409Z 2025-12-04T10:13:48.2719655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2720191Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.2720641Z ====================== 1 failed, 15 deselected in 27.97s ======================= 2025-12-04T10:13:48.2721014Z Got exit code 1 2025-12-04T10:13:48.2721246Z Retrying single test... 2025-12-04T10:13:48.2722370Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a997c4f2b1c679bc.xml 2025-12-04T10:13:48.2723244Z ============================= test session starts ============================== 2025-12-04T10:13:48.2723848Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.2724392Z cachedir: .pytest_cache 2025-12-04T10:13:48.2725038Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.2725754Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.2726068Z configfile: pytest.ini 2025-12-04T10:13:48.2726914Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.2727753Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.2728805Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2729788Z Running 1 items in this shard 2025-12-04T10:13:48.2729985Z 2025-12-04T10:13:48.2730980Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda I1204 09:53:01.160000 70657 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 70709 2025-12-04T10:13:48.2732569Z I1204 09:53:01.161000 70657 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 70710 2025-12-04T10:13:48.2733911Z I1204 09:53:01.161000 70657 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 70711 2025-12-04T10:13:48.2735084Z I1204 09:53:01.162000 70657 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 70712 2025-12-04T10:13:48.2737724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2739951Z _warn_cpu_init() 2025-12-04T10:13:48.2742092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2744320Z _warn_cpu_init() 2025-12-04T10:13:48.2746635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2748825Z _warn_cpu_init() 2025-12-04T10:13:48.2749921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2751082Z _init_core_state( 2025-12-04T10:13:48.2752211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2753383Z _init_core_state( 2025-12-04T10:13:48.2754481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2755638Z _init_core_state( 2025-12-04T10:13:48.2757377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2759255Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2761289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2764687Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2768168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2771859Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2776478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2781041Z _warn_cpu_init() 2025-12-04T10:13:48.2783314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2785660Z _init_core_state( 2025-12-04T10:13:48.2789098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2793005Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2796381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2799471Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2801467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2803357Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2804560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.2805724Z return func(*args, **kwargs) 2025-12-04T10:13:48.2807511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2809423Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2810414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2811361Z return func(*args, **kwargs) 2025-12-04T10:13:48.2812269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2813313Z return func(*args, **kwargs) 2025-12-04T10:13:48.2814424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2815474Z return func(*args, **kwargs) 2025-12-04T10:13:48.2816435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.2817432Z return func(*args, **kwargs) 2025-12-04T10:13:48.2818376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2819361Z return func(*args, **kwargs) 2025-12-04T10:13:48.2820314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2821305Z return func(*args, **kwargs) 2025-12-04T10:13:48.2822250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2823238Z return func(*args, **kwargs) 2025-12-04T10:13:48.2824188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.2825178Z return func(*args, **kwargs) 2025-12-04T10:13:48.2825929Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2826925Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2828392Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2829836Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2831292Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2832629Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2833942Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2835332Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2836731Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2838114Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2839540Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2840899Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2842257Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2843676Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2845686Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 602865664 and is now 10404954112. 2025-12-04T10:13:48.2847567Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2848584Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2850267Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2851686Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2852753Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2854337Z [rank1]:E1204 09:53:11.255000 70710 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.2855459Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2856564Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2858221Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2859879Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2861499Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2863002Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2864484Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2866140Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2867530Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2868968Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2870360Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2871709Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2873094Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2874492Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2876505Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 10516103168. 2025-12-04T10:13:48.2878399Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2879834Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2881731Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2883341Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2884614Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2886007Z [rank0]:E1204 09:53:11.255000 70709 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.2887140Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2888246Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2889946Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2891757Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2893250Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2894878Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2896361Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2897935Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2899549Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2901115Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2902679Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2904247Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2905892Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2907404Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2909413Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.2911292Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2912309Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2914021Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2915439Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2917259Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2918914Z [rank3]:E1204 09:53:11.257000 70712 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.2919988Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.2921101Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.2922663Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2924200Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.2925721Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2927142Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.2928535Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2930121Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2931506Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2932902Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.2934693Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2941027Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.2942645Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2944240Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.2946551Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2948438Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2949464Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2951217Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2952628Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.2953693Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2954924Z [rank2]:E1204 09:53:11.262000 70711 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.2955642Z dist init r=3, world=4 2025-12-04T10:13:48.2955874Z dist init r=1, world=4 2025-12-04T10:13:48.2956102Z dist init r=0, world=4 2025-12-04T10:13:48.2956327Z dist init r=2, world=4 2025-12-04T10:13:48.2957491Z [rank3]:[W1204 09:53:11.767477299 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2959618Z [rank1]:[W1204 09:53:11.767935096 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2961738Z [rank0]:[W1204 09:53:11.773889619 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2963888Z [rank2]:[W1204 09:53:11.786725746 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.2965097Z FAILED [27.7655s] [100%] 2025-12-04T10:13:48.2965257Z 2025-12-04T10:13:48.2965387Z =================================== FAILURES =================================== 2025-12-04T10:13:48.2965902Z _____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda _____ 2025-12-04T10:13:48.2966425Z Traceback (most recent call last): 2025-12-04T10:13:48.2967102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.2967780Z self._join_processes(fn) 2025-12-04T10:13:48.2968466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.2969216Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.2969973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.2970711Z raise RuntimeError(error) 2025-12-04T10:13:48.2970917Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.2971019Z Traceback (most recent call last): 2025-12-04T10:13:48.2971493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2971594Z getattr(self, test_name)() 2025-12-04T10:13:48.2972060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2972139Z fn() 2025-12-04T10:13:48.2972585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2972672Z method(*args, **kwargs) 2025-12-04T10:13:48.2973142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2973327Z method(*args, **kwargs) 2025-12-04T10:13:48.2973956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2974056Z with policy(): 2025-12-04T10:13:48.2974553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2974665Z raise RuntimeError(msg) 2025-12-04T10:13:48.2975908Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2975918Z 2025-12-04T10:13:48.2976130Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2976797Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2976804Z 2025-12-04T10:13:48.2977060Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2977066Z 2025-12-04T10:13:48.2977071Z 2025-12-04T10:13:48.2977290Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.2977551Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.2978352Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a997c4f2b1c679bc.xml - 2025-12-04T10:13:48.2978550Z =========================== short test summary info ============================ 2025-12-04T10:13:48.2979582Z FAILED [27.7655s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.2979707Z Traceback (most recent call last): 2025-12-04T10:13:48.2980247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.2980359Z getattr(self, test_name)() 2025-12-04T10:13:48.2980893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.2981051Z fn() 2025-12-04T10:13:48.2981558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2981658Z method(*args, **kwargs) 2025-12-04T10:13:48.2982161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.2982262Z method(*args, **kwargs) 2025-12-04T10:13:48.2982758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.2982854Z with policy(): 2025-12-04T10:13:48.2983356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.2983458Z raise RuntimeError(msg) 2025-12-04T10:13:48.2984656Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T10:13:48.2984665Z 2025-12-04T10:13:48.2984875Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.2985549Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2985555Z 2025-12-04T10:13:48.2985857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.2986030Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.2986209Z ====================== 1 failed, 32 deselected in 27.98s ======================= 2025-12-04T10:13:48.2986300Z Got exit code 1 2025-12-04T10:13:48.2986399Z Retrying single test... 2025-12-04T10:13:48.2987021Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb42278badc3bd05.xml 2025-12-04T10:13:48.2987177Z ============================= test session starts ============================== 2025-12-04T10:13:48.2987522Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.2987663Z cachedir: .pytest_cache 2025-12-04T10:13:48.2988174Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.2988298Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.2988400Z configfile: pytest.ini 2025-12-04T10:13:48.2988938Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.2989148Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.2989887Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.2990003Z Running 1 items in this shard 2025-12-04T10:13:48.2990009Z 2025-12-04T10:13:48.2991196Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda I1204 09:53:33.569000 71906 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 71958 2025-12-04T10:13:48.2991677Z I1204 09:53:33.570000 71906 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 71959 2025-12-04T10:13:48.2992109Z I1204 09:53:33.571000 71906 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 71960 2025-12-04T10:13:48.2992535Z I1204 09:53:33.572000 71906 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 71961 2025-12-04T10:13:48.2994329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2994442Z _warn_cpu_init() 2025-12-04T10:13:48.2996216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.2996299Z _warn_cpu_init() 2025-12-04T10:13:48.2997206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2997289Z _init_core_state( 2025-12-04T10:13:48.2998831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.2998979Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.2999870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.2999959Z _init_core_state( 2025-12-04T10:13:48.3001477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3001631Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3003390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3003482Z _warn_cpu_init() 2025-12-04T10:13:48.3005242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3005356Z _warn_cpu_init() 2025-12-04T10:13:48.3006250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.3006330Z _init_core_state( 2025-12-04T10:13:48.3007872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3008014Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3008912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.3008992Z _init_core_state( 2025-12-04T10:13:48.3010494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3010640Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3012164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3012312Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3013182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.3013345Z return func(*args, **kwargs) 2025-12-04T10:13:48.3015203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3015370Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3017064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3017221Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3017993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3018130Z return func(*args, **kwargs) 2025-12-04T10:13:48.3018897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3019004Z return func(*args, **kwargs) 2025-12-04T10:13:48.3019760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3019867Z return func(*args, **kwargs) 2025-12-04T10:13:48.3020623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3020759Z return func(*args, **kwargs) 2025-12-04T10:13:48.3021506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3021612Z return func(*args, **kwargs) 2025-12-04T10:13:48.3022363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3022465Z return func(*args, **kwargs) 2025-12-04T10:13:48.3023216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3023317Z return func(*args, **kwargs) 2025-12-04T10:13:48.3024063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3024171Z return func(*args, **kwargs) 2025-12-04T10:13:48.3024625Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3025163Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3026370Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3026817Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3027699Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3028049Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3028935Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3029364Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3030208Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3030631Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3031475Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3031895Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3032741Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3033175Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3034633Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3034985Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3035566Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3036555Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3036875Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3037506Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3037988Z [rank1]:E1204 09:53:43.767000 71959 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.3038381Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3038876Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3039755Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3040197Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3041107Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3041454Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3042304Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3042731Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3043573Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3043999Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3044870Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3045265Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3046111Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3046544Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3048035Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.3048361Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3048938Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3049918Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3050242Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3050874Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3051389Z [rank2]:E1204 09:53:43.768000 71960 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.3051784Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3052255Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3053130Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3053802Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3054821Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3055217Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3056170Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3056648Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3057601Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3058112Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3059068Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3059509Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3060463Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3060987Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3062645Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.3063006Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3063657Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3064770Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3065132Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3065980Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3066465Z [rank3]:E1204 09:53:43.768000 71961 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.3066855Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3067326Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3068232Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3068677Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3069549Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3069893Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3070738Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3071163Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3072028Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3072460Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3073299Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3073730Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3074577Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3075010Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3076463Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 707723264 and is now 10516103168. 2025-12-04T10:13:48.3076780Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3077359Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3078340Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3078827Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3079687Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3080233Z [rank0]:E1204 09:53:43.772000 71958 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.3080331Z dist init r=1, world=4 2025-12-04T10:13:48.3080423Z dist init r=2, world=4 2025-12-04T10:13:48.3080518Z dist init r=3, world=4 2025-12-04T10:13:48.3080607Z dist init r=0, world=4 2025-12-04T10:13:48.3081815Z [rank1]:[W1204 09:53:44.281417566 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3082951Z [rank2]:[W1204 09:53:44.282028999 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3084085Z [rank3]:[W1204 09:53:44.285395148 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3085224Z [rank0]:[W1204 09:53:44.301997512 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3085358Z FAILED [28.0313s] [100%] 2025-12-04T10:13:48.3085365Z 2025-12-04T10:13:48.3085510Z =================================== FAILURES =================================== 2025-12-04T10:13:48.3085815Z _____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda _____ 2025-12-04T10:13:48.3085934Z Traceback (most recent call last): 2025-12-04T10:13:48.3086480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.3086585Z self._join_processes(fn) 2025-12-04T10:13:48.3087170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.3087349Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.3087947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.3088062Z raise RuntimeError(error) 2025-12-04T10:13:48.3088291Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.3088410Z Traceback (most recent call last): 2025-12-04T10:13:48.3088944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3089048Z getattr(self, test_name)() 2025-12-04T10:13:48.3089579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3089663Z fn() 2025-12-04T10:13:48.3090163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3090266Z method(*args, **kwargs) 2025-12-04T10:13:48.3090763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3090873Z method(*args, **kwargs) 2025-12-04T10:13:48.3091470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3091701Z with policy(): 2025-12-04T10:13:48.3092152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3092244Z raise RuntimeError(msg) 2025-12-04T10:13:48.3093359Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3093369Z 2025-12-04T10:13:48.3093556Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3094415Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3094424Z 2025-12-04T10:13:48.3094688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3094695Z 2025-12-04T10:13:48.3094855Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3094976Z Traceback (most recent call last): 2025-12-04T10:13:48.3095521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3095627Z getattr(self, test_name)() 2025-12-04T10:13:48.3096163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3096250Z fn() 2025-12-04T10:13:48.3096748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3096854Z method(*args, **kwargs) 2025-12-04T10:13:48.3097382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3097487Z method(*args, **kwargs) 2025-12-04T10:13:48.3097986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3098078Z with policy(): 2025-12-04T10:13:48.3098585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3098688Z raise RuntimeError(msg) 2025-12-04T10:13:48.3099891Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.3099999Z 2025-12-04T10:13:48.3100209Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3100869Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3100881Z 2025-12-04T10:13:48.3101141Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3101146Z 2025-12-04T10:13:48.3101151Z 2025-12-04T10:13:48.3101368Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.3101631Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.3102428Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb42278badc3bd05.xml - 2025-12-04T10:13:48.3102598Z =========================== short test summary info ============================ 2025-12-04T10:13:48.3103423Z FAILED [28.0313s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.3103542Z Traceback (most recent call last): 2025-12-04T10:13:48.3104117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3104223Z getattr(self, test_name)() 2025-12-04T10:13:48.3104750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3104838Z fn() 2025-12-04T10:13:48.3105340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3105449Z method(*args, **kwargs) 2025-12-04T10:13:48.3106043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3106132Z method(*args, **kwargs) 2025-12-04T10:13:48.3106608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3106689Z with policy(): 2025-12-04T10:13:48.3107133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3107230Z raise RuntimeError(msg) 2025-12-04T10:13:48.3108282Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3108290Z 2025-12-04T10:13:48.3108481Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3109061Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3109096Z 2025-12-04T10:13:48.3109328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3109333Z 2025-12-04T10:13:48.3109471Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3109573Z Traceback (most recent call last): 2025-12-04T10:13:48.3110057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3110150Z getattr(self, test_name)() 2025-12-04T10:13:48.3110612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3110717Z fn() 2025-12-04T10:13:48.3111156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3111248Z method(*args, **kwargs) 2025-12-04T10:13:48.3111688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3111777Z method(*args, **kwargs) 2025-12-04T10:13:48.3112222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3112302Z with policy(): 2025-12-04T10:13:48.3112746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3112836Z raise RuntimeError(msg) 2025-12-04T10:13:48.3113893Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.3113900Z 2025-12-04T10:13:48.3114089Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3114674Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3114679Z 2025-12-04T10:13:48.3114935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3115089Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.3115242Z ====================== 1 failed, 32 deselected in 28.25s ======================= 2025-12-04T10:13:48.3115327Z Got exit code 1 2025-12-04T10:13:48.3115846Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda 2025-12-04T10:13:48.3116204Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.3116744Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e66930a4930311d.xml 2025-12-04T10:13:48.3116911Z ============================= test session starts ============================== 2025-12-04T10:13:48.3117216Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.3117309Z cachedir: .pytest_cache 2025-12-04T10:13:48.3117760Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.3117868Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.3117956Z configfile: pytest.ini 2025-12-04T10:13:48.3118427Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.3118617Z collecting ... collected 60 items / 16 deselected / 44 selected 2025-12-04T10:13:48.3118736Z stepcurrent: skipping 16 already run items. 2025-12-04T10:13:48.3118836Z Running 17 items in this shard 2025-12-04T10:13:48.3118841Z 2025-12-04T10:13:48.3119822Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 09:54:06.110000 73155 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 73207 2025-12-04T10:13:48.3120263Z I1204 09:54:06.111000 73155 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 73208 2025-12-04T10:13:48.3120693Z I1204 09:54:06.111000 73155 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 73209 2025-12-04T10:13:48.3121120Z I1204 09:54:06.112000 73155 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 73210 2025-12-04T10:13:48.3122934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3123018Z _warn_cpu_init() 2025-12-04T10:13:48.3124783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3124870Z _warn_cpu_init() 2025-12-04T10:13:48.3126629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3126737Z _warn_cpu_init() 2025-12-04T10:13:48.3127652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3127734Z _init_core_state( 2025-12-04T10:13:48.3128637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3128725Z _init_core_state( 2025-12-04T10:13:48.3130258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3130411Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3131917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3132069Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3134105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3134241Z _warn_cpu_init() 2025-12-04T10:13:48.3135276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3135397Z _init_core_state( 2025-12-04T10:13:48.3137093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3137255Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3138278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3138371Z _init_core_state( 2025-12-04T10:13:48.3140067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3140232Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3141971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3142134Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3143829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3143993Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3145015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.3145128Z return func(*args, **kwargs) 2025-12-04T10:13:48.3146819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3146962Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3147644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3147767Z return func(*args, **kwargs) 2025-12-04T10:13:48.3148442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3148532Z return func(*args, **kwargs) 2025-12-04T10:13:48.3149200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3149295Z return func(*args, **kwargs) 2025-12-04T10:13:48.3149961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3150087Z return func(*args, **kwargs) 2025-12-04T10:13:48.3150751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3150847Z return func(*args, **kwargs) 2025-12-04T10:13:48.3151526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3151615Z return func(*args, **kwargs) 2025-12-04T10:13:48.3152286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3152377Z return func(*args, **kwargs) 2025-12-04T10:13:48.3153043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3153135Z return func(*args, **kwargs) 2025-12-04T10:13:48.3153536Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3154006Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3154906Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3155351Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3156228Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3156601Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3157453Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3157878Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3158727Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3159151Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3160184Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3160631Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3161533Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3161992Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3163680Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168. 2025-12-04T10:13:48.3164182Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3164801Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3165883Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3166227Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3166894Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3167409Z [rank0]:E1204 09:54:16.237000 73207 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.3167855Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3168358Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3169289Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3169758Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3170725Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3171100Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3171999Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3172449Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3173408Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3174041Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3175030Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3175477Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3176430Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3176946Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3178799Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.3179172Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3179825Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3180972Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3181333Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3182045Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3182649Z [rank1]:E1204 09:54:16.239000 73208 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.3183093Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3183621Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3184621Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3185157Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3186151Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3186543Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3187509Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3187992Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3188948Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3189464Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3190408Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3190937Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3191816Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3192255Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3193748Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3194071Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3194649Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3195668Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3195996Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3196654Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3197142Z [rank2]:E1204 09:54:16.239000 73209 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.3197538Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3198009Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3198916Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3199367Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3200245Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3200589Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3201442Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3201894Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3202743Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3203166Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3204007Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3204427Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3205272Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3205711Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3207198Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.3207526Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3208101Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3209422Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3209752Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3210379Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3210866Z [rank3]:E1204 09:54:16.239000 73210 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.3210953Z dist init r=0, world=4 2025-12-04T10:13:48.3211036Z dist init r=2, world=4 2025-12-04T10:13:48.3211122Z dist init r=1, world=4 2025-12-04T10:13:48.3211232Z dist init r=3, world=4 2025-12-04T10:13:48.3212254Z [rank0]:[W1204 09:54:16.748739330 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3213325Z [rank2]:[W1204 09:54:16.750467390 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3214592Z [rank3]:[W1204 09:54:16.751871356 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3215729Z [rank1]:[W1204 09:54:16.752128708 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3215866Z FAILED [27.9672s] [ 5%] 2025-12-04T10:13:48.3215873Z 2025-12-04T10:13:48.3216018Z =================================== FAILURES =================================== 2025-12-04T10:13:48.3216334Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.3216457Z Traceback (most recent call last): 2025-12-04T10:13:48.3216995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.3217132Z self._join_processes(fn) 2025-12-04T10:13:48.3217718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.3217854Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.3218456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.3218570Z raise RuntimeError(error) 2025-12-04T10:13:48.3218803Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.3218922Z Traceback (most recent call last): 2025-12-04T10:13:48.3219452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3219556Z getattr(self, test_name)() 2025-12-04T10:13:48.3220089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3220175Z fn() 2025-12-04T10:13:48.3220675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3220782Z method(*args, **kwargs) 2025-12-04T10:13:48.3221282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3221386Z method(*args, **kwargs) 2025-12-04T10:13:48.3221927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3222019Z with policy(): 2025-12-04T10:13:48.3222525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3222627Z raise RuntimeError(msg) 2025-12-04T10:13:48.3223866Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168. 2025-12-04T10:13:48.3223875Z 2025-12-04T10:13:48.3224115Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3224813Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3224827Z 2025-12-04T10:13:48.3225088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3225093Z 2025-12-04T10:13:48.3225248Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.3225368Z Traceback (most recent call last): 2025-12-04T10:13:48.3226012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3226109Z getattr(self, test_name)() 2025-12-04T10:13:48.3226581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3226655Z fn() 2025-12-04T10:13:48.3227135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3227222Z method(*args, **kwargs) 2025-12-04T10:13:48.3227663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3227754Z method(*args, **kwargs) 2025-12-04T10:13:48.3228190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3228270Z with policy(): 2025-12-04T10:13:48.3228720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3228840Z raise RuntimeError(msg) 2025-12-04T10:13:48.3229934Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3229941Z 2025-12-04T10:13:48.3230125Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3230745Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3230755Z 2025-12-04T10:13:48.3230984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3230989Z 2025-12-04T10:13:48.3230993Z 2025-12-04T10:13:48.3231182Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.3231418Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.3232116Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e66930a4930311d.xml - 2025-12-04T10:13:48.3232271Z =========================== short test summary info ============================ 2025-12-04T10:13:48.3233061Z FAILED [27.9672s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.3233166Z Traceback (most recent call last): 2025-12-04T10:13:48.3233649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3233741Z getattr(self, test_name)() 2025-12-04T10:13:48.3234212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3234287Z fn() 2025-12-04T10:13:48.3234730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3234823Z method(*args, **kwargs) 2025-12-04T10:13:48.3235294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3235381Z method(*args, **kwargs) 2025-12-04T10:13:48.3235829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3235911Z with policy(): 2025-12-04T10:13:48.3236358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3236449Z raise RuntimeError(msg) 2025-12-04T10:13:48.3237532Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168. 2025-12-04T10:13:48.3237539Z 2025-12-04T10:13:48.3237731Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3238381Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3238388Z 2025-12-04T10:13:48.3238620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3238624Z 2025-12-04T10:13:48.3238764Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.3238866Z Traceback (most recent call last): 2025-12-04T10:13:48.3239346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3239469Z getattr(self, test_name)() 2025-12-04T10:13:48.3239944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3240018Z fn() 2025-12-04T10:13:48.3240458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3240555Z method(*args, **kwargs) 2025-12-04T10:13:48.3240997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3241087Z method(*args, **kwargs) 2025-12-04T10:13:48.3241534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3241615Z with policy(): 2025-12-04T10:13:48.3242065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3242159Z raise RuntimeError(msg) 2025-12-04T10:13:48.3243250Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3243256Z 2025-12-04T10:13:48.3243446Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3244091Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3244097Z 2025-12-04T10:13:48.3244331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3244484Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.3244638Z ====================== 1 failed, 16 deselected in 28.19s ======================= 2025-12-04T10:13:48.3244724Z Got exit code 1 2025-12-04T10:13:48.3244811Z Retrying single test... 2025-12-04T10:13:48.3245368Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-caca850dfa53af0d.xml 2025-12-04T10:13:48.3245533Z ============================= test session starts ============================== 2025-12-04T10:13:48.3245836Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.3245933Z cachedir: .pytest_cache 2025-12-04T10:13:48.3246384Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.3246485Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.3246578Z configfile: pytest.ini 2025-12-04T10:13:48.3247044Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.3247238Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.3247923Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3248060Z Running 1 items in this shard 2025-12-04T10:13:48.3248065Z 2025-12-04T10:13:48.3249009Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 09:54:38.590000 74404 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 74456 2025-12-04T10:13:48.3249448Z I1204 09:54:38.590000 74404 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 74457 2025-12-04T10:13:48.3249885Z I1204 09:54:38.591000 74404 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 74458 2025-12-04T10:13:48.3250340Z I1204 09:54:38.592000 74404 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 74459 2025-12-04T10:13:48.3252130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3252216Z _warn_cpu_init() 2025-12-04T10:13:48.3254260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3254364Z _warn_cpu_init() 2025-12-04T10:13:48.3255398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3255500Z _init_core_state( 2025-12-04T10:13:48.3256562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3256658Z _init_core_state( 2025-12-04T10:13:48.3258364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3258527Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3260259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3260419Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3262423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3262549Z _warn_cpu_init() 2025-12-04T10:13:48.3264543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3264636Z _warn_cpu_init() 2025-12-04T10:13:48.3265661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3265797Z _init_core_state( 2025-12-04T10:13:48.3267424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3267575Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3268485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3268571Z _init_core_state( 2025-12-04T10:13:48.3270071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3270217Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3271732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3271876Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3273395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3273536Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3275033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3275171Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3276050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.3276144Z return func(*args, **kwargs) 2025-12-04T10:13:48.3276857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3276951Z return func(*args, **kwargs) 2025-12-04T10:13:48.3277627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3277723Z return func(*args, **kwargs) 2025-12-04T10:13:48.3278399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3278521Z return func(*args, **kwargs) 2025-12-04T10:13:48.3279536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3279645Z return func(*args, **kwargs) 2025-12-04T10:13:48.3280407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3280516Z return func(*args, **kwargs) 2025-12-04T10:13:48.3281262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3281371Z return func(*args, **kwargs) 2025-12-04T10:13:48.3282121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3282231Z return func(*args, **kwargs) 2025-12-04T10:13:48.3282981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3283089Z return func(*args, **kwargs) 2025-12-04T10:13:48.3283552Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3284146Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3285154Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3285660Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3286693Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3287088Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3288046Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3288533Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3289478Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3289967Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3290957Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3291507Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3292353Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3292783Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3294657Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3295022Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3295683Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3296840Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3297206Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3297918Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3298502Z [rank1]:E1204 09:54:48.724000 74457 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.3298957Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3299486Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3300487Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3300991Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3302012Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3302404Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3303351Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3303846Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3304800Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3305381Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3306386Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3306785Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3307631Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3308084Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3309586Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.3309903Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3310492Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3311512Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3311837Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3312490Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3312969Z [rank3]:E1204 09:54:48.725000 74459 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.3313370Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3313836Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3314740Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3315185Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3316053Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3316401Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3317242Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3317673Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3318554Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3318984Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3319829Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3320248Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3321095Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3321522Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3323018Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.3323339Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3323924Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3324947Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3325292Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3325922Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3326396Z [rank2]:E1204 09:54:48.725000 74458 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.3326797Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3327285Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3328171Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3328613Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3329485Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3329834Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3330679Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3331142Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3331986Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3332413Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3333333Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3333914Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3334886Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3335370Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3337071Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T10:13:48.3337433Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3338101Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3339289Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3339650Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3340357Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3340899Z [rank0]:E1204 09:54:48.730000 74456 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.3341002Z dist init r=1, world=4 2025-12-04T10:13:48.3341127Z dist init r=2, world=4 2025-12-04T10:13:48.3341220Z dist init r=0, world=4 2025-12-04T10:13:48.3341316Z dist init r=3, world=4 2025-12-04T10:13:48.3342460Z [rank1]:[W1204 09:54:49.239727612 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3343599Z [rank0]:[W1204 09:54:49.244294770 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3344728Z [rank2]:[W1204 09:54:49.245949433 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3346006Z [rank3]:[W1204 09:54:49.250986122 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3346091Z FAILED [28.0559s] [100%] 2025-12-04T10:13:48.3346097Z 2025-12-04T10:13:48.3346220Z =================================== FAILURES =================================== 2025-12-04T10:13:48.3346510Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.3346612Z Traceback (most recent call last): 2025-12-04T10:13:48.3347136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.3347233Z self._join_processes(fn) 2025-12-04T10:13:48.3347747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.3347882Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.3348415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.3348517Z raise RuntimeError(error) 2025-12-04T10:13:48.3348721Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.3348823Z Traceback (most recent call last): 2025-12-04T10:13:48.3349299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3349397Z getattr(self, test_name)() 2025-12-04T10:13:48.3349862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3349944Z fn() 2025-12-04T10:13:48.3350391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3350489Z method(*args, **kwargs) 2025-12-04T10:13:48.3350955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3351045Z method(*args, **kwargs) 2025-12-04T10:13:48.3351493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3351574Z with policy(): 2025-12-04T10:13:48.3352019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3352121Z raise RuntimeError(msg) 2025-12-04T10:13:48.3353235Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3353243Z 2025-12-04T10:13:48.3353437Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3354059Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3354064Z 2025-12-04T10:13:48.3354298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3354302Z 2025-12-04T10:13:48.3354442Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3354543Z Traceback (most recent call last): 2025-12-04T10:13:48.3355030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3355124Z getattr(self, test_name)() 2025-12-04T10:13:48.3355601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3355707Z fn() 2025-12-04T10:13:48.3356147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3356241Z method(*args, **kwargs) 2025-12-04T10:13:48.3356680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3356770Z method(*args, **kwargs) 2025-12-04T10:13:48.3357216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3357325Z with policy(): 2025-12-04T10:13:48.3357768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3357864Z raise RuntimeError(msg) 2025-12-04T10:13:48.3358962Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.3358969Z 2025-12-04T10:13:48.3359163Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3359782Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3359787Z 2025-12-04T10:13:48.3360020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3360027Z 2025-12-04T10:13:48.3360031Z 2025-12-04T10:13:48.3360224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.3360451Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.3361164Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-caca850dfa53af0d.xml - 2025-12-04T10:13:48.3361311Z =========================== short test summary info ============================ 2025-12-04T10:13:48.3362105Z FAILED [28.0559s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.3362208Z Traceback (most recent call last): 2025-12-04T10:13:48.3362690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3362791Z getattr(self, test_name)() 2025-12-04T10:13:48.3363261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3363341Z fn() 2025-12-04T10:13:48.3363820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3363912Z method(*args, **kwargs) 2025-12-04T10:13:48.3364359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3364446Z method(*args, **kwargs) 2025-12-04T10:13:48.3364885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3364971Z with policy(): 2025-12-04T10:13:48.3365416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3365518Z raise RuntimeError(msg) 2025-12-04T10:13:48.3366610Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3366641Z 2025-12-04T10:13:48.3366826Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3367458Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3367463Z 2025-12-04T10:13:48.3367693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3367697Z 2025-12-04T10:13:48.3367844Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3367946Z Traceback (most recent call last): 2025-12-04T10:13:48.3368453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3368552Z getattr(self, test_name)() 2025-12-04T10:13:48.3369024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3369108Z fn() 2025-12-04T10:13:48.3369550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3369641Z method(*args, **kwargs) 2025-12-04T10:13:48.3370084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3370171Z method(*args, **kwargs) 2025-12-04T10:13:48.3370614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3370695Z with policy(): 2025-12-04T10:13:48.3371144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3371244Z raise RuntimeError(msg) 2025-12-04T10:13:48.3372330Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.3372337Z 2025-12-04T10:13:48.3372552Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3373178Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3373182Z 2025-12-04T10:13:48.3373470Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3373804Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.3373982Z ====================== 1 failed, 32 deselected in 28.27s ======================= 2025-12-04T10:13:48.3374080Z Got exit code 1 2025-12-04T10:13:48.3374178Z Retrying single test... 2025-12-04T10:13:48.3374832Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5ab6f8f72e0857a0.xml 2025-12-04T10:13:48.3375001Z ============================= test session starts ============================== 2025-12-04T10:13:48.3375347Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.3375449Z cachedir: .pytest_cache 2025-12-04T10:13:48.3375965Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.3376081Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.3376188Z configfile: pytest.ini 2025-12-04T10:13:48.3376715Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.3376925Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.3377709Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3377848Z Running 1 items in this shard 2025-12-04T10:13:48.3377854Z 2025-12-04T10:13:48.3379140Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 09:55:11.150000 75653 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 75705 2025-12-04T10:13:48.3379637Z I1204 09:55:11.150000 75653 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 75706 2025-12-04T10:13:48.3380124Z I1204 09:55:11.151000 75653 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 75707 2025-12-04T10:13:48.3380681Z I1204 09:55:11.152000 75653 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 75708 2025-12-04T10:13:48.3382693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3382795Z _warn_cpu_init() 2025-12-04T10:13:48.3384769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3384872Z _warn_cpu_init() 2025-12-04T10:13:48.3386901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3387000Z _warn_cpu_init() 2025-12-04T10:13:48.3388034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3388129Z _init_core_state( 2025-12-04T10:13:48.3389199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3389297Z _init_core_state( 2025-12-04T10:13:48.3390439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3390528Z _init_core_state( 2025-12-04T10:13:48.3392164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3392314Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3393818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3394001Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3395504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3395681Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3397454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3397543Z _warn_cpu_init() 2025-12-04T10:13:48.3398448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.3398532Z _init_core_state( 2025-12-04T10:13:48.3400044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3400210Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3401723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3401863Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3403390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3403532Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3404410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.3404505Z return func(*args, **kwargs) 2025-12-04T10:13:48.3405999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3406175Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3406855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3406954Z return func(*args, **kwargs) 2025-12-04T10:13:48.3407629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3407720Z return func(*args, **kwargs) 2025-12-04T10:13:48.3408400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3408521Z return func(*args, **kwargs) 2025-12-04T10:13:48.3409198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3409289Z return func(*args, **kwargs) 2025-12-04T10:13:48.3409957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3410053Z return func(*args, **kwargs) 2025-12-04T10:13:48.3410718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3410812Z return func(*args, **kwargs) 2025-12-04T10:13:48.3411474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3411563Z return func(*args, **kwargs) 2025-12-04T10:13:48.3412234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3412324Z return func(*args, **kwargs) 2025-12-04T10:13:48.3412772Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3413299Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3414423Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3414934Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3415947Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3416352Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3417307Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3417792Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3418746Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3419254Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3420212Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3420650Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3421607Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3422120Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3423811Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T10:13:48.3424169Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3424819Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3426169Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3426490Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3427159Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3427637Z [rank2]:E1204 09:55:21.196000 75707 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.3428035Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3428499Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3429380Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3429860Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3430731Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3431081Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3431928Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3432357Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3433205Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3433661Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3434504Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3434892Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3435775Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3436208Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3437701Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T10:13:48.3438017Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3438599Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3439621Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3439964Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3440600Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3441076Z [rank0]:E1204 09:55:21.197000 75705 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.3441474Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3441938Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3442847Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3443303Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3444172Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3444524Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3445368Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3445823Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3446678Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3447106Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3447953Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3448367Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3449218Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3449647Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3451145Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.3451465Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3452041Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3453093Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3453490Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3454361Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3454902Z [rank1]:E1204 09:55:21.197000 75706 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.3455344Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3455908Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3456900Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3457405Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3458384Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3458778Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3459762Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3460245Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3461200Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3461677Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3462673Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3463113Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3464072Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3464553Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3466305Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T10:13:48.3466630Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3467229Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3468254Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3468568Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3469202Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3469712Z [rank3]:E1204 09:55:21.197000 75708 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.3469802Z dist init r=2, world=4 2025-12-04T10:13:48.3469891Z dist init r=0, world=4 2025-12-04T10:13:48.3469974Z dist init r=1, world=4 2025-12-04T10:13:48.3470061Z dist init r=3, world=4 2025-12-04T10:13:48.3471080Z [rank2]:[W1204 09:55:21.705567135 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3472082Z [rank1]:[W1204 09:55:21.713278491 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3473095Z [rank0]:[W1204 09:55:21.713398757 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3474128Z [rank3]:[W1204 09:55:21.717416284 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3474220Z FAILED [27.7631s] [100%] 2025-12-04T10:13:48.3474226Z 2025-12-04T10:13:48.3474350Z =================================== FAILURES =================================== 2025-12-04T10:13:48.3474665Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.3474768Z Traceback (most recent call last): 2025-12-04T10:13:48.3475251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.3475356Z self._join_processes(fn) 2025-12-04T10:13:48.3475873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.3475997Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.3476532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.3476632Z raise RuntimeError(error) 2025-12-04T10:13:48.3476839Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.3476943Z Traceback (most recent call last): 2025-12-04T10:13:48.3477415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3477521Z getattr(self, test_name)() 2025-12-04T10:13:48.3477988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3478066Z fn() 2025-12-04T10:13:48.3478513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3478772Z method(*args, **kwargs) 2025-12-04T10:13:48.3479422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3479525Z method(*args, **kwargs) 2025-12-04T10:13:48.3480024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3480128Z with policy(): 2025-12-04T10:13:48.3480631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3480740Z raise RuntimeError(msg) 2025-12-04T10:13:48.3482028Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.3482038Z 2025-12-04T10:13:48.3482250Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3482964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3482970Z 2025-12-04T10:13:48.3483229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3483235Z 2025-12-04T10:13:48.3483242Z 2025-12-04T10:13:48.3483463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.3483721Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.3484518Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5ab6f8f72e0857a0.xml - 2025-12-04T10:13:48.3484728Z =========================== short test summary info ============================ 2025-12-04T10:13:48.3485590Z FAILED [27.7631s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.3485713Z Traceback (most recent call last): 2025-12-04T10:13:48.3486254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3486403Z getattr(self, test_name)() 2025-12-04T10:13:48.3486938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3487021Z fn() 2025-12-04T10:13:48.3487531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3487638Z method(*args, **kwargs) 2025-12-04T10:13:48.3488141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3488248Z method(*args, **kwargs) 2025-12-04T10:13:48.3488744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3488837Z with policy(): 2025-12-04T10:13:48.3489342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3489448Z raise RuntimeError(msg) 2025-12-04T10:13:48.3490690Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T10:13:48.3490700Z 2025-12-04T10:13:48.3490909Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3491828Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3491834Z 2025-12-04T10:13:48.3492065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3492221Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.3492382Z ====================== 1 failed, 32 deselected in 27.98s ======================= 2025-12-04T10:13:48.3492468Z Got exit code 1 2025-12-04T10:13:48.3493022Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.3493440Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.3494240Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82a2bf200d1dcaa2.xml 2025-12-04T10:13:48.3494406Z ============================= test session starts ============================== 2025-12-04T10:13:48.3494747Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.3494850Z cachedir: .pytest_cache 2025-12-04T10:13:48.3495361Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.3495477Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.3495591Z configfile: pytest.ini 2025-12-04T10:13:48.3496116Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.3496327Z collecting ... collected 60 items / 17 deselected / 43 selected 2025-12-04T10:13:48.3496503Z stepcurrent: skipping 17 already run items. 2025-12-04T10:13:48.3496612Z Running 16 items in this shard 2025-12-04T10:13:48.3496617Z 2025-12-04T10:13:48.3497762Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda I1204 09:55:43.639000 76902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 76954 2025-12-04T10:13:48.3498256Z I1204 09:55:43.640000 76902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 76955 2025-12-04T10:13:48.3498740Z I1204 09:55:43.641000 76902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 76956 2025-12-04T10:13:48.3499260Z I1204 09:55:43.642000 76902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 76957 2025-12-04T10:13:48.3501264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3501367Z _warn_cpu_init() 2025-12-04T10:13:48.3503369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3503469Z _warn_cpu_init() 2025-12-04T10:13:48.3505597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3505694Z _warn_cpu_init() 2025-12-04T10:13:48.3507335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3507482Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3509086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3509228Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3510731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3510873Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3512689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3512772Z _warn_cpu_init() 2025-12-04T10:13:48.3514272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3514444Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3515320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3515536Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3516408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3516624Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3517491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3517706Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3519237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3519381Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3520878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3521051Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3521937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3522130Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3523009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3523203Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3524075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3524299Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3525170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3525384Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3526893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3527065Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3527943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3528133Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3532109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3532463Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3533144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3533295Z return func(*args, **kwargs) 2025-12-04T10:13:48.3537927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3538323Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3539093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3539230Z return func(*args, **kwargs) 2025-12-04T10:13:48.3543712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3544203Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3544973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3545077Z return func(*args, **kwargs) 2025-12-04T10:13:48.3549360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3549711Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3550385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3550479Z return func(*args, **kwargs) 2025-12-04T10:13:48.3551176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3551272Z return func(*args, **kwargs) 2025-12-04T10:13:48.3551944Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3552036Z return func(*args, **kwargs) 2025-12-04T10:13:48.3552705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3552806Z return func(*args, **kwargs) 2025-12-04T10:13:48.3553471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3553568Z return func(*args, **kwargs) 2025-12-04T10:13:48.3554442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.3554562Z return func(*args, **kwargs) 2025-12-04T10:13:48.3554968Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3555439Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3556326Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3556796Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3557677Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3558029Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3558883Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3559315Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3560160Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3560596Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3561461Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3561852Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3562709Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3563139Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3564996Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 718209024 and is now 10524491776. 2025-12-04T10:13:48.3565335Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3565958Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3567127Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3567496Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3568163Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3568669Z [rank0]:E1204 09:56:09.956000 76954 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.3569094Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3569616Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3570562Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3571032Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3571961Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3572327Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3573287Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3573925Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3574914Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3575396Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3576348Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3576785Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3577773Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3578256Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3580241Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.3580607Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3581262Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3582570Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3582933Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3583648Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3584188Z [rank2]:E1204 09:56:09.956000 76956 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.3584679Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3585205Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3586205Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3586707Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3587689Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3588082Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3589228Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3590179Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3591876Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3592650Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3594249Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3595025Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3596561Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3597339Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3600702Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 607059968 and is now 10413342720. 2025-12-04T10:13:48.3601360Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3602645Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3604909Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3605596Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3607020Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3608060Z [rank1]:E1204 09:56:09.956000 76955 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.3608914Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3609852Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3611683Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3612562Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3614615Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3615368Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3617315Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3618252Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3620070Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3620967Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3622940Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3623762Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3625761Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3626423Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3628021Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 611254272 and is now 10413342720. 2025-12-04T10:13:48.3628421Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3629004Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3630109Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3630649Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3631319Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3631832Z [rank3]:E1204 09:56:09.957000 76957 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.3631935Z dist init r=2, world=4 2025-12-04T10:13:48.3632023Z dist init r=3, world=4 2025-12-04T10:13:48.3632121Z dist init r=1, world=4 2025-12-04T10:13:48.3632211Z dist init r=0, world=4 2025-12-04T10:13:48.3633295Z [rank2]:[W1204 09:56:10.470769205 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3634375Z [rank3]:[W1204 09:56:10.474323732 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3635493Z [rank1]:[W1204 09:56:10.493245284 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3636561Z [rank0]:[W1204 09:56:10.496644671 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3636658Z FAILED [48.3023s] [ 6%] 2025-12-04T10:13:48.3636666Z 2025-12-04T10:13:48.3636810Z =================================== FAILURES =================================== 2025-12-04T10:13:48.3637195Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda _ 2025-12-04T10:13:48.3637341Z Traceback (most recent call last): 2025-12-04T10:13:48.3637860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.3637963Z self._join_processes(fn) 2025-12-04T10:13:48.3638603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.3638731Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.3639260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.3639365Z raise RuntimeError(error) 2025-12-04T10:13:48.3639569Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3639672Z Traceback (most recent call last): 2025-12-04T10:13:48.3640154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3640282Z getattr(self, test_name)() 2025-12-04T10:13:48.3640748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3640829Z fn() 2025-12-04T10:13:48.3641272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3641367Z method(*args, **kwargs) 2025-12-04T10:13:48.3641807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3641932Z method(*args, **kwargs) 2025-12-04T10:13:48.3642377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3642460Z with policy(): 2025-12-04T10:13:48.3642904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3643007Z raise RuntimeError(msg) 2025-12-04T10:13:48.3644188Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 611254272 and is now 10413342720. 2025-12-04T10:13:48.3644195Z 2025-12-04T10:13:48.3644387Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3645087Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3645095Z 2025-12-04T10:13:48.3645333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3645338Z 2025-12-04T10:13:48.3645342Z 2025-12-04T10:13:48.3645536Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.3645766Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.3646507Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82a2bf200d1dcaa2.xml - 2025-12-04T10:13:48.3646657Z =========================== short test summary info ============================ 2025-12-04T10:13:48.3647509Z FAILED [48.3023s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3647614Z Traceback (most recent call last): 2025-12-04T10:13:48.3648097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3648196Z getattr(self, test_name)() 2025-12-04T10:13:48.3648693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3648777Z fn() 2025-12-04T10:13:48.3649219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3649307Z method(*args, **kwargs) 2025-12-04T10:13:48.3649752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3649841Z method(*args, **kwargs) 2025-12-04T10:13:48.3650280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3650374Z with policy(): 2025-12-04T10:13:48.3650821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3650921Z raise RuntimeError(msg) 2025-12-04T10:13:48.3652119Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 611254272 and is now 10413342720. 2025-12-04T10:13:48.3652125Z 2025-12-04T10:13:48.3652319Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3653018Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3653050Z 2025-12-04T10:13:48.3653382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3653542Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.3653884Z ====================== 1 failed, 17 deselected in 48.52s ======================= 2025-12-04T10:13:48.3653984Z Got exit code 1 2025-12-04T10:13:48.3654093Z Retrying single test... 2025-12-04T10:13:48.3654712Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a014b9bd1b37d049.xml 2025-12-04T10:13:48.3654874Z ============================= test session starts ============================== 2025-12-04T10:13:48.3655215Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.3655317Z cachedir: .pytest_cache 2025-12-04T10:13:48.3655832Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.3655951Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.3656061Z configfile: pytest.ini 2025-12-04T10:13:48.3656594Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.3656811Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.3657725Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3657834Z Running 1 items in this shard 2025-12-04T10:13:48.3657840Z 2025-12-04T10:13:48.3658991Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda I1204 09:56:36.610000 78007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 78059 2025-12-04T10:13:48.3659486Z I1204 09:56:36.611000 78007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 78060 2025-12-04T10:13:48.3659968Z I1204 09:56:36.611000 78007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 78061 2025-12-04T10:13:48.3660490Z I1204 09:56:36.612000 78007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 78062 2025-12-04T10:13:48.3662501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3662606Z _warn_cpu_init() 2025-12-04T10:13:48.3664612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3664747Z _warn_cpu_init() 2025-12-04T10:13:48.3666842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3666961Z _warn_cpu_init() 2025-12-04T10:13:48.3668468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3668614Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3670121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3670268Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3671953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3672105Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3674004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3674093Z _warn_cpu_init() 2025-12-04T10:13:48.3675718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3675874Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3676811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3677034Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3677960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3678192Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3680224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3680396Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3682084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3682346Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3683343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3683564Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3684554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3684765Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3685756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3685994Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3687741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3687905Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3688890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3689112Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3690127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3690372Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3691561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3691754Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3696245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3696691Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3697491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3697601Z return func(*args, **kwargs) 2025-12-04T10:13:48.3702081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3702483Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3703283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3703389Z return func(*args, **kwargs) 2025-12-04T10:13:48.3707763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3708118Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3708802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3708899Z return func(*args, **kwargs) 2025-12-04T10:13:48.3712880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3713277Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3713955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3714050Z return func(*args, **kwargs) 2025-12-04T10:13:48.3714724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3714818Z return func(*args, **kwargs) 2025-12-04T10:13:48.3715488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3715579Z return func(*args, **kwargs) 2025-12-04T10:13:48.3716245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3716344Z return func(*args, **kwargs) 2025-12-04T10:13:48.3717007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3717107Z return func(*args, **kwargs) 2025-12-04T10:13:48.3718193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.3718289Z return func(*args, **kwargs) 2025-12-04T10:13:48.3718726Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3719224Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3720194Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3720668Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3721602Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3721971Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3722869Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3723331Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3724259Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3724717Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3725610Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3726060Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3726957Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3727412Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3729088Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 714014720 and is now 10524491776. 2025-12-04T10:13:48.3729427Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3730327Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3731523Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3731866Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3732531Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3733037Z [rank0]:E1204 09:57:03.985000 78059 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.3733521Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3734243Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3735250Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3735751Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3736739Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3737131Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3738087Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3738619Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3739577Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3740066Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3741046Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3741493Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3742452Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3742935Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3744710Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 607059968 and is now 10413342720. 2025-12-04T10:13:48.3745070Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3745839Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3747152Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3747481Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3748114Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3748596Z [rank1]:E1204 09:57:03.985000 78060 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.3749023Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3749491Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3750376Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3750819Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3751694Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3752068Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3752917Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3753348Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3754189Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3754650Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3755491Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3755889Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3756732Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3757162Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3758749Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 609157120 and is now 10413342720. 2025-12-04T10:13:48.3759093Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3759678Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3760772Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3761098Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3761749Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3762230Z [rank2]:E1204 09:57:03.986000 78061 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.3762630Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3763090Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3763972Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3764417Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3765319Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3765666Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3766506Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3766940Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3767807Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3768300Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3769778Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3770578Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3771483Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3771943Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3773955Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 334430208 and is now 10413342720. 2025-12-04T10:13:48.3774320Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3774982Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3776256Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3776624Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3777335Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3777877Z [rank3]:E1204 09:57:03.986000 78062 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.3777982Z dist init r=2, world=4 2025-12-04T10:13:48.3778076Z dist init r=3, world=4 2025-12-04T10:13:48.3778177Z dist init r=0, world=4 2025-12-04T10:13:48.3778271Z dist init r=1, world=4 2025-12-04T10:13:48.3779610Z [rank2]:[W1204 09:57:04.497410569 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3780843Z [rank3]:[W1204 09:57:04.498095432 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3781979Z [rank0]:[W1204 09:57:04.498365072 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3783113Z [rank1]:[W1204 09:57:04.499913412 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3783257Z FAILED [50.4347s] [100%] 2025-12-04T10:13:48.3783268Z 2025-12-04T10:13:48.3783425Z =================================== FAILURES =================================== 2025-12-04T10:13:48.3783838Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda _ 2025-12-04T10:13:48.3783957Z Traceback (most recent call last): 2025-12-04T10:13:48.3784507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.3784617Z self._join_processes(fn) 2025-12-04T10:13:48.3785200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.3785348Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.3785948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.3786064Z raise RuntimeError(error) 2025-12-04T10:13:48.3786298Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.3786414Z Traceback (most recent call last): 2025-12-04T10:13:48.3786995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3787104Z getattr(self, test_name)() 2025-12-04T10:13:48.3787638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3787727Z fn() 2025-12-04T10:13:48.3788223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3788332Z method(*args, **kwargs) 2025-12-04T10:13:48.3788825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3788924Z method(*args, **kwargs) 2025-12-04T10:13:48.3789487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3789582Z with policy(): 2025-12-04T10:13:48.3790089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3790191Z raise RuntimeError(msg) 2025-12-04T10:13:48.3791497Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 714014720 and is now 10524491776. 2025-12-04T10:13:48.3791506Z 2025-12-04T10:13:48.3791701Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3792404Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3792438Z 2025-12-04T10:13:48.3792683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3792688Z 2025-12-04T10:13:48.3792693Z 2025-12-04T10:13:48.3792883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.3793111Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.3793822Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a014b9bd1b37d049.xml - 2025-12-04T10:13:48.3793968Z =========================== short test summary info ============================ 2025-12-04T10:13:48.3794841Z FAILED [50.4347s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.3794945Z Traceback (most recent call last): 2025-12-04T10:13:48.3795430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3795527Z getattr(self, test_name)() 2025-12-04T10:13:48.3796003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3796084Z fn() 2025-12-04T10:13:48.3796526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3796614Z method(*args, **kwargs) 2025-12-04T10:13:48.3797066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3797156Z method(*args, **kwargs) 2025-12-04T10:13:48.3797604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3797689Z with policy(): 2025-12-04T10:13:48.3798133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3798231Z raise RuntimeError(msg) 2025-12-04T10:13:48.3799423Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 714014720 and is now 10524491776. 2025-12-04T10:13:48.3799429Z 2025-12-04T10:13:48.3799623Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3800324Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3800329Z 2025-12-04T10:13:48.3800555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3800741Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.3800894Z ====================== 1 failed, 32 deselected in 50.65s ======================= 2025-12-04T10:13:48.3800984Z Got exit code 1 2025-12-04T10:13:48.3801072Z Retrying single test... 2025-12-04T10:13:48.3801621Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-95f17e90ca4b9755.xml 2025-12-04T10:13:48.3801760Z ============================= test session starts ============================== 2025-12-04T10:13:48.3802061Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.3802153Z cachedir: .pytest_cache 2025-12-04T10:13:48.3802607Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.3802708Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.3802832Z configfile: pytest.ini 2025-12-04T10:13:48.3803299Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.3803489Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.3804264Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3804357Z Running 1 items in this shard 2025-12-04T10:13:48.3804362Z 2025-12-04T10:13:48.3805381Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda I1204 09:57:31.619000 79112 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 79164 2025-12-04T10:13:48.3805851Z I1204 09:57:31.620000 79112 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 79165 2025-12-04T10:13:48.3806281Z I1204 09:57:31.621000 79112 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 79166 2025-12-04T10:13:48.3806715Z I1204 09:57:31.622000 79112 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 79167 2025-12-04T10:13:48.3808499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3808589Z _warn_cpu_init() 2025-12-04T10:13:48.3810118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3810270Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3812035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3812125Z _warn_cpu_init() 2025-12-04T10:13:48.3814205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3814310Z _warn_cpu_init() 2025-12-04T10:13:48.3816294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3816419Z _warn_cpu_init() 2025-12-04T10:13:48.3818136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3818296Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3819991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3820179Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3821886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3822043Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3823029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3823269Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3824992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3825156Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3826322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3826518Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3827390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3827631Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3828507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3828696Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3829577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3829782Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3831287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3831453Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3832328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3832534Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T10:13:48.3834031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3834211Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3835086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3835281Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3836153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.3836346Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.3840348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3840705Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3841416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3841514Z return func(*args, **kwargs) 2025-12-04T10:13:48.3845495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3845871Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3849819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3850195Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3850876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3850975Z return func(*args, **kwargs) 2025-12-04T10:13:48.3851649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3851748Z return func(*args, **kwargs) 2025-12-04T10:13:48.3856243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3856673Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3857437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3857546Z return func(*args, **kwargs) 2025-12-04T10:13:48.3858303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3858411Z return func(*args, **kwargs) 2025-12-04T10:13:48.3859165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3859269Z return func(*args, **kwargs) 2025-12-04T10:13:48.3860019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3860158Z return func(*args, **kwargs) 2025-12-04T10:13:48.3860911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.3861018Z return func(*args, **kwargs) 2025-12-04T10:13:48.3862012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.3862151Z return func(*args, **kwargs) 2025-12-04T10:13:48.3862611Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3863136Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3864140Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3864642Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3865739Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3866217Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3867067Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3867498Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3868373Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3868801Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3869639Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3870036Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3870911Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3871337Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3872907Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 716111872 and is now 10524491776. 2025-12-04T10:13:48.3873227Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3873836Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3874936Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3875256Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3875884Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3876391Z [rank0]:E1204 09:57:57.213000 79164 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.3876796Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3877261Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3878140Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3878584Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3879843Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3880240Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3881251Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3881739Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3882691Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3883176Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3884160Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3884609Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3885560Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3886043Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3887830Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 611254272 and is now 10413342720. 2025-12-04T10:13:48.3892063Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3892693Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3894081Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3894564Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3895282Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3895822Z [rank1]:E1204 09:57:57.213000 79165 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.3896276Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3896801Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3897805Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3898311Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3899292Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3899720Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3900673Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3901161Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3902119Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3902640Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3903591Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3904027Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3904982Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3905465Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3907335Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.3907654Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3908239Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3909365Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3909687Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3910316Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3910794Z [rank3]:E1204 09:57:57.214000 79167 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.3911192Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.3911660Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.3912541Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3912988Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.3913883Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3914234Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.3915077Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3915511Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3916385Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3916816Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.3917653Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3918039Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.3918892Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3919349Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.3920925Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 607059968 and is now 10413342720. 2025-12-04T10:13:48.3921271Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3921848Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3922949Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3923270Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.3923897Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3924372Z [rank2]:E1204 09:57:57.214000 79166 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.3924466Z dist init r=0, world=4 2025-12-04T10:13:48.3924549Z dist init r=3, world=4 2025-12-04T10:13:48.3924629Z dist init r=2, world=4 2025-12-04T10:13:48.3924715Z dist init r=1, world=4 2025-12-04T10:13:48.3925762Z [rank3]:[W1204 09:57:57.724572131 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3926765Z [rank0]:[W1204 09:57:57.726280386 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3927757Z [rank2]:[W1204 09:57:57.728260554 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3928796Z [rank1]:[W1204 09:57:57.751286662 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.3928883Z FAILED [45.7989s] [100%] 2025-12-04T10:13:48.3928890Z 2025-12-04T10:13:48.3929016Z =================================== FAILURES =================================== 2025-12-04T10:13:48.3929379Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda _ 2025-12-04T10:13:48.3929482Z Traceback (most recent call last): 2025-12-04T10:13:48.3929962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.3930057Z self._join_processes(fn) 2025-12-04T10:13:48.3930567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.3930693Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.3931251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.3931353Z raise RuntimeError(error) 2025-12-04T10:13:48.3931556Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3931658Z Traceback (most recent call last): 2025-12-04T10:13:48.3932135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3932227Z getattr(self, test_name)() 2025-12-04T10:13:48.3932695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3932802Z fn() 2025-12-04T10:13:48.3933312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3933408Z method(*args, **kwargs) 2025-12-04T10:13:48.3934054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3934153Z method(*args, **kwargs) 2025-12-04T10:13:48.3934654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3934747Z with policy(): 2025-12-04T10:13:48.3935244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3935354Z raise RuntimeError(msg) 2025-12-04T10:13:48.3936683Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.3936693Z 2025-12-04T10:13:48.3936909Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3937705Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3937746Z 2025-12-04T10:13:48.3938011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3938016Z 2025-12-04T10:13:48.3938021Z 2025-12-04T10:13:48.3938235Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.3938492Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.3939297Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-95f17e90ca4b9755.xml - 2025-12-04T10:13:48.3939461Z =========================== short test summary info ============================ 2025-12-04T10:13:48.3940437Z FAILED [45.7989s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.3940556Z Traceback (most recent call last): 2025-12-04T10:13:48.3941101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.3941212Z getattr(self, test_name)() 2025-12-04T10:13:48.3941744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.3941836Z fn() 2025-12-04T10:13:48.3942337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3942436Z method(*args, **kwargs) 2025-12-04T10:13:48.3942939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.3943086Z method(*args, **kwargs) 2025-12-04T10:13:48.3943582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.3943680Z with policy(): 2025-12-04T10:13:48.3944185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.3944291Z raise RuntimeError(msg) 2025-12-04T10:13:48.3945623Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 604962816 and is now 10413342720. 2025-12-04T10:13:48.3945658Z 2025-12-04T10:13:48.3945976Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.3946681Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3946688Z 2025-12-04T10:13:48.3946921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.3947082Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.3947238Z ====================== 1 failed, 32 deselected in 46.02s ======================= 2025-12-04T10:13:48.3947318Z Got exit code 1 2025-12-04T10:13:48.3947955Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda 2025-12-04T10:13:48.3948313Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.3948867Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7ff8c73ee302c339.xml 2025-12-04T10:13:48.3949010Z ============================= test session starts ============================== 2025-12-04T10:13:48.3949310Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.3949437Z cachedir: .pytest_cache 2025-12-04T10:13:48.3949886Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.3949993Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.3950081Z configfile: pytest.ini 2025-12-04T10:13:48.3950547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.3950740Z collecting ... collected 60 items / 18 deselected / 42 selected 2025-12-04T10:13:48.3950859Z stepcurrent: skipping 18 already run items. 2025-12-04T10:13:48.3950953Z Running 15 items in this shard 2025-12-04T10:13:48.3950959Z 2025-12-04T10:13:48.3951997Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 09:58:22.700000 80217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 80269 2025-12-04T10:13:48.3952435Z I1204 09:58:22.701000 80217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 80270 2025-12-04T10:13:48.3952866Z I1204 09:58:22.701000 80217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 80271 2025-12-04T10:13:48.3953297Z I1204 09:58:22.702000 80217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 80272 2025-12-04T10:13:48.3955089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3955199Z _warn_cpu_init() 2025-12-04T10:13:48.3956978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3957156Z _warn_cpu_init() 2025-12-04T10:13:48.3958927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3959018Z _warn_cpu_init() 2025-12-04T10:13:48.3959917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.3960007Z _init_core_state( 2025-12-04T10:13:48.3960897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.3960986Z _init_core_state( 2025-12-04T10:13:48.3961873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.3961957Z _init_core_state( 2025-12-04T10:13:48.3963491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3963635Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3965166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3965309Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3966814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3966953Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3968732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.3968841Z _warn_cpu_init() 2025-12-04T10:13:48.3969732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.3969818Z _init_core_state( 2025-12-04T10:13:48.3971310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3971485Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3972985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3973128Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3975000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3975162Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3976886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.3977042Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.3981803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3982207Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3986665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3987129Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3991664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3992012Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3995988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.3996556Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.3997807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3997973Z return func(*args, **kwargs) 2025-12-04T10:13:48.3999186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.3999357Z return func(*args, **kwargs) 2025-12-04T10:13:48.4000617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4000795Z return func(*args, **kwargs) 2025-12-04T10:13:48.4001958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4002218Z return func(*args, **kwargs) 2025-12-04T10:13:48.4003408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4003576Z return func(*args, **kwargs) 2025-12-04T10:13:48.4004763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4004931Z return func(*args, **kwargs) 2025-12-04T10:13:48.4006449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4006626Z return func(*args, **kwargs) 2025-12-04T10:13:48.4008174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4008357Z return func(*args, **kwargs) 2025-12-04T10:13:48.4010113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.4010296Z return func(*args, **kwargs) 2025-12-04T10:13:48.4011105Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4012080Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4014233Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4015227Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4017233Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4017978Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4019757Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4020658Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4022528Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4023454Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4025376Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4026276Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4027884Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4028800Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4031772Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384. 2025-12-04T10:13:48.4032374Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4033406Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4034581Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4034924Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4035594Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4036103Z [rank0]:E1204 09:58:49.380000 80269 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.4036535Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4037031Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4038030Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4038501Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4039423Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4039800Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4040731Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4041192Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4042082Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4042539Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4043435Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4043848Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4044781Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4045234Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4046890Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4047258Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4047881Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4049108Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4049433Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4050059Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4050539Z [rank3]:E1204 09:58:49.382000 80272 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.4050941Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4051433Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4052315Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4052759Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4053931Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4054368Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4055322Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4055807Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4056757Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4057241Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4058190Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4058664Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4059630Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4060112Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4061901Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4062264Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4062918Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4064142Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4064505Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4065214Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4065801Z [rank2]:E1204 09:58:49.383000 80271 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.4066314Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4066779Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4067661Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4068108Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4069003Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4069356Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4070195Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4070625Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4071470Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4071927Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4072768Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4073157Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4074005Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4074462Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4076027Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4076342Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4076922Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4078005Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4078326Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4079317Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4079860Z [rank1]:E1204 09:58:49.383000 80270 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.4079962Z dist init r=2, world=4 2025-12-04T10:13:48.4080055Z dist init r=0, world=4 2025-12-04T10:13:48.4080147Z dist init r=1, world=4 2025-12-04T10:13:48.4080249Z dist init r=3, world=4 2025-12-04T10:13:48.4081397Z [rank0]:[W1204 09:58:49.889811273 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4082605Z [rank3]:[W1204 09:58:49.892249189 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4083732Z [rank2]:[W1204 09:58:49.892793005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4084869Z [rank1]:[W1204 09:58:49.893518684 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4084969Z FAILED [48.6843s] [ 6%] 2025-12-04T10:13:48.4084977Z 2025-12-04T10:13:48.4085163Z =================================== FAILURES =================================== 2025-12-04T10:13:48.4085562Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _ 2025-12-04T10:13:48.4085683Z Traceback (most recent call last): 2025-12-04T10:13:48.4086231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.4086338Z self._join_processes(fn) 2025-12-04T10:13:48.4086920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.4087062Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.4087709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.4087824Z raise RuntimeError(error) 2025-12-04T10:13:48.4088053Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4088173Z Traceback (most recent call last): 2025-12-04T10:13:48.4088714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4088822Z getattr(self, test_name)() 2025-12-04T10:13:48.4089345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4089438Z fn() 2025-12-04T10:13:48.4089939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4090046Z method(*args, **kwargs) 2025-12-04T10:13:48.4090542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4090640Z method(*args, **kwargs) 2025-12-04T10:13:48.4091253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4091346Z with policy(): 2025-12-04T10:13:48.4091945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4092045Z raise RuntimeError(msg) 2025-12-04T10:13:48.4093264Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4093274Z 2025-12-04T10:13:48.4093470Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4094392Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4094401Z 2025-12-04T10:13:48.4094704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4094710Z 2025-12-04T10:13:48.4094715Z 2025-12-04T10:13:48.4094937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.4095196Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.4095998Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7ff8c73ee302c339.xml - 2025-12-04T10:13:48.4096161Z =========================== short test summary info ============================ 2025-12-04T10:13:48.4097102Z FAILED [48.6843s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4097218Z Traceback (most recent call last): 2025-12-04T10:13:48.4097791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4097902Z getattr(self, test_name)() 2025-12-04T10:13:48.4098430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4098521Z fn() 2025-12-04T10:13:48.4099024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4099124Z method(*args, **kwargs) 2025-12-04T10:13:48.4099629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4099764Z method(*args, **kwargs) 2025-12-04T10:13:48.4100259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4100357Z with policy(): 2025-12-04T10:13:48.4100865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4100974Z raise RuntimeError(msg) 2025-12-04T10:13:48.4102283Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4102290Z 2025-12-04T10:13:48.4102500Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4103274Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4103283Z 2025-12-04T10:13:48.4103539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4103722Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.4103894Z ====================== 1 failed, 18 deselected in 48.90s ======================= 2025-12-04T10:13:48.4103985Z Got exit code 1 2025-12-04T10:13:48.4104118Z Retrying single test... 2025-12-04T10:13:48.4104740Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cb6a3efe573e986.xml 2025-12-04T10:13:48.4104900Z ============================= test session starts ============================== 2025-12-04T10:13:48.4105242Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.4105344Z cachedir: .pytest_cache 2025-12-04T10:13:48.4106064Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.4106167Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.4106259Z configfile: pytest.ini 2025-12-04T10:13:48.4106763Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.4106951Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.4107705Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4107800Z Running 1 items in this shard 2025-12-04T10:13:48.4107805Z 2025-12-04T10:13:48.4108799Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 09:59:16.230000 81322 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 81374 2025-12-04T10:13:48.4109242Z I1204 09:59:16.231000 81322 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 81375 2025-12-04T10:13:48.4109701Z I1204 09:59:16.231000 81322 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 81376 2025-12-04T10:13:48.4110137Z I1204 09:59:16.232000 81322 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 81377 2025-12-04T10:13:48.4112922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4113345Z _warn_cpu_init() 2025-12-04T10:13:48.4115298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4115396Z _warn_cpu_init() 2025-12-04T10:13:48.4116353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4116439Z _init_core_state( 2025-12-04T10:13:48.4118043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4118197Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4120111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4120203Z _warn_cpu_init() 2025-12-04T10:13:48.4121160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4121247Z _init_core_state( 2025-12-04T10:13:48.4123225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4123316Z _warn_cpu_init() 2025-12-04T10:13:48.4124822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4125006Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4125906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4125992Z _init_core_state( 2025-12-04T10:13:48.4127491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4127669Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4128565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4128648Z _init_core_state( 2025-12-04T10:13:48.4130155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4130295Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4131800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4131944Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4133568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4133891Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4135612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4135774Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4140249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4140674Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4145161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4145685Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4149789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4150135Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4154106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4154453Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4155135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4155236Z return func(*args, **kwargs) 2025-12-04T10:13:48.4155910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4156040Z return func(*args, **kwargs) 2025-12-04T10:13:48.4156716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4156808Z return func(*args, **kwargs) 2025-12-04T10:13:48.4157485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4157604Z return func(*args, **kwargs) 2025-12-04T10:13:48.4158269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4158362Z return func(*args, **kwargs) 2025-12-04T10:13:48.4159029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4159128Z return func(*args, **kwargs) 2025-12-04T10:13:48.4159789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4159877Z return func(*args, **kwargs) 2025-12-04T10:13:48.4160545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4160637Z return func(*args, **kwargs) 2025-12-04T10:13:48.4161521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.4161617Z return func(*args, **kwargs) 2025-12-04T10:13:48.4162024Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4162524Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4163412Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4163860Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4164751Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4165105Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4165950Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4166377Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4167225Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4167651Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4168541Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4168929Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4169776Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4170235Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4171791Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T10:13:48.4172116Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4172690Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4174041Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4174403Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4175119Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4175691Z [rank0]:E1204 09:59:44.703000 81374 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.4176135Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4176664Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4177656Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4178192Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4179369Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4179766Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4180722Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4181204Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4182158Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4182707Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4183654Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4184091Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4185083Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4185571Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4187337Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4187703Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4188350Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4189578Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4189936Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4190894Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4191406Z [rank1]:E1204 09:59:44.704000 81375 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.4191827Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4192324Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4193359Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4193809Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4194680Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4195026Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4195882Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4196403Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4197258Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4197680Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4198525Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4198944Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4199796Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4200232Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4201796Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4202123Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4202703Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4203828Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4204147Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4204783Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4205264Z [rank2]:E1204 09:59:44.704000 81376 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.4205657Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4206155Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4207038Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4207488Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4208362Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4208710Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4209588Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4210015Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4210865Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4211287Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4212163Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4212558Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4213462Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4214105Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4215854Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 250544128 and is now 10421731328. 2025-12-04T10:13:48.4216223Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4216914Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4218144Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4218499Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4219215Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4219798Z [rank3]:E1204 09:59:44.705000 81377 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.4219899Z dist init r=2, world=4 2025-12-04T10:13:48.4219998Z dist init r=3, world=4 2025-12-04T10:13:48.4220093Z dist init r=1, world=4 2025-12-04T10:13:48.4220183Z dist init r=0, world=4 2025-12-04T10:13:48.4221341Z [rank2]:[W1204 09:59:45.218198827 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4222468Z [rank3]:[W1204 09:59:45.218650219 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4223614Z [rank1]:[W1204 09:59:45.219714339 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4224777Z [rank0]:[W1204 09:59:45.234135146 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4224878Z FAILED [52.1376s] [100%] 2025-12-04T10:13:48.4224887Z 2025-12-04T10:13:48.4225028Z =================================== FAILURES =================================== 2025-12-04T10:13:48.4225445Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _ 2025-12-04T10:13:48.4225567Z Traceback (most recent call last): 2025-12-04T10:13:48.4226189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.4226292Z self._join_processes(fn) 2025-12-04T10:13:48.4226803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.4226926Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.4227458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.4227555Z raise RuntimeError(error) 2025-12-04T10:13:48.4227755Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4227863Z Traceback (most recent call last): 2025-12-04T10:13:48.4228334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4228431Z getattr(self, test_name)() 2025-12-04T10:13:48.4228896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4228972Z fn() 2025-12-04T10:13:48.4229417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4229531Z method(*args, **kwargs) 2025-12-04T10:13:48.4229977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4230064Z method(*args, **kwargs) 2025-12-04T10:13:48.4230501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4230589Z with policy(): 2025-12-04T10:13:48.4231037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4231129Z raise RuntimeError(msg) 2025-12-04T10:13:48.4232322Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4232333Z 2025-12-04T10:13:48.4232520Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4233210Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4233215Z 2025-12-04T10:13:48.4233447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4233454Z 2025-12-04T10:13:48.4233459Z 2025-12-04T10:13:48.4233653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.4233880Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.4234614Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cb6a3efe573e986.xml - 2025-12-04T10:13:48.4234766Z =========================== short test summary info ============================ 2025-12-04T10:13:48.4235592Z FAILED [52.1376s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4235699Z Traceback (most recent call last): 2025-12-04T10:13:48.4236181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4236303Z getattr(self, test_name)() 2025-12-04T10:13:48.4236774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4236849Z fn() 2025-12-04T10:13:48.4237300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4237387Z method(*args, **kwargs) 2025-12-04T10:13:48.4237827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4237919Z method(*args, **kwargs) 2025-12-04T10:13:48.4238360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4238441Z with policy(): 2025-12-04T10:13:48.4238887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4238982Z raise RuntimeError(msg) 2025-12-04T10:13:48.4240150Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4240158Z 2025-12-04T10:13:48.4240342Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4241053Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4241064Z 2025-12-04T10:13:48.4241292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4241445Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.4241604Z ====================== 1 failed, 32 deselected in 52.36s ======================= 2025-12-04T10:13:48.4241685Z Got exit code 1 2025-12-04T10:13:48.4241774Z Retrying single test... 2025-12-04T10:13:48.4242351Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-44c122860a547cf4.xml 2025-12-04T10:13:48.4242491Z ============================= test session starts ============================== 2025-12-04T10:13:48.4242799Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.4242886Z cachedir: .pytest_cache 2025-12-04T10:13:48.4243335Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.4243443Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.4243532Z configfile: pytest.ini 2025-12-04T10:13:48.4244000Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.4244194Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.4244951Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4245078Z Running 1 items in this shard 2025-12-04T10:13:48.4245084Z 2025-12-04T10:13:48.4246085Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 10:00:13.190000 82427 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 82479 2025-12-04T10:13:48.4246524Z I1204 10:00:13.191000 82427 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 82480 2025-12-04T10:13:48.4246962Z I1204 10:00:13.191000 82427 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 82481 2025-12-04T10:13:48.4247425Z I1204 10:00:13.192000 82427 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 82482 2025-12-04T10:13:48.4249206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4249292Z _warn_cpu_init() 2025-12-04T10:13:48.4251063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4251149Z _warn_cpu_init() 2025-12-04T10:13:48.4252948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4253031Z _warn_cpu_init() 2025-12-04T10:13:48.4254224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4254321Z _init_core_state( 2025-12-04T10:13:48.4256047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4256221Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4257231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4257329Z _init_core_state( 2025-12-04T10:13:48.4258339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4258433Z _init_core_state( 2025-12-04T10:13:48.4260137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4260328Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4262025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4262211Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4264211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4264306Z _warn_cpu_init() 2025-12-04T10:13:48.4265322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T10:13:48.4265416Z _init_core_state( 2025-12-04T10:13:48.4267081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4267229Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4268758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4268906Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4270425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4270569Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4272068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4272214Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4276183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4276583Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4281039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4281437Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4285983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4286693Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4292613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4293423Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4295026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4295214Z return func(*args, **kwargs) 2025-12-04T10:13:48.4296517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4296824Z return func(*args, **kwargs) 2025-12-04T10:13:48.4298182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4298395Z return func(*args, **kwargs) 2025-12-04T10:13:48.4299707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4299905Z return func(*args, **kwargs) 2025-12-04T10:13:48.4301290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4301495Z return func(*args, **kwargs) 2025-12-04T10:13:48.4302892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4303078Z return func(*args, **kwargs) 2025-12-04T10:13:48.4304501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4304687Z return func(*args, **kwargs) 2025-12-04T10:13:48.4306311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4306514Z return func(*args, **kwargs) 2025-12-04T10:13:48.4308358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.4308555Z return func(*args, **kwargs) 2025-12-04T10:13:48.4309418Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4310428Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4312317Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4313433Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4315019Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4315638Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4317239Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4318166Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4319790Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4320588Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4322193Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4323057Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4324624Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4325463Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4327406Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T10:13:48.4327739Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4328330Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4329490Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4329818Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4330636Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4331152Z [rank0]:E1204 10:00:39.680000 82479 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.4331608Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4332110Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4333050Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4333796Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4334785Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4335182Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4336178Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4336666Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4337616Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4338161Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4339116Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4339565Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4340516Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4340999Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4342760Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4343121Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4343808Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4345034Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4345400Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4346312Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4346799Z [rank1]:E1204 10:00:39.681000 82480 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.4347193Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4347660Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4348548Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4348996Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4349871Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4350247Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4351094Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4351518Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4352390Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4352825Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4353666Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4354062Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4354906Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4355335Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4356943Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4357262Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4357848Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4358927Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4359280Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4359910Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4360392Z [rank2]:E1204 10:00:39.681000 82481 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.4360786Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4361248Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4362128Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4362598Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4363472Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4363816Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4364668Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4365121Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4365968Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4366399Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4367245Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4367636Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4368485Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4368923Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4370501Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T10:13:48.4370818Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4371404Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4372515Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4372843Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4373532Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4374235Z [rank3]:E1204 10:00:39.681000 82482 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.4374335Z dist init r=3, world=4 2025-12-04T10:13:48.4374429Z dist init r=0, world=4 2025-12-04T10:13:48.4374523Z dist init r=1, world=4 2025-12-04T10:13:48.4374616Z dist init r=2, world=4 2025-12-04T10:13:48.4375769Z [rank0]:[W1204 10:00:40.191702585 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4376956Z [rank3]:[W1204 10:00:40.191945464 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4378090Z [rank1]:[W1204 10:00:40.192292133 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4379484Z [rank2]:[W1204 10:00:40.205982164 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4379587Z FAILED [48.7966s] [100%] 2025-12-04T10:13:48.4379595Z 2025-12-04T10:13:48.4379746Z =================================== FAILURES =================================== 2025-12-04T10:13:48.4380136Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _ 2025-12-04T10:13:48.4380252Z Traceback (most recent call last): 2025-12-04T10:13:48.4380804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.4380911Z self._join_processes(fn) 2025-12-04T10:13:48.4381499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.4381635Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.4382232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.4382350Z raise RuntimeError(error) 2025-12-04T10:13:48.4382581Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.4382769Z Traceback (most recent call last): 2025-12-04T10:13:48.4383309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4383414Z getattr(self, test_name)() 2025-12-04T10:13:48.4383950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4384037Z fn() 2025-12-04T10:13:48.4384539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4384646Z method(*args, **kwargs) 2025-12-04T10:13:48.4385182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4385285Z method(*args, **kwargs) 2025-12-04T10:13:48.4385792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4385884Z with policy(): 2025-12-04T10:13:48.4386388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4386492Z raise RuntimeError(msg) 2025-12-04T10:13:48.4387820Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T10:13:48.4387834Z 2025-12-04T10:13:48.4388046Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4388872Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4388879Z 2025-12-04T10:13:48.4389146Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4389152Z 2025-12-04T10:13:48.4389312Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4389432Z Traceback (most recent call last): 2025-12-04T10:13:48.4389972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4390076Z getattr(self, test_name)() 2025-12-04T10:13:48.4390765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4390841Z fn() 2025-12-04T10:13:48.4391286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4391386Z method(*args, **kwargs) 2025-12-04T10:13:48.4391826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4391919Z method(*args, **kwargs) 2025-12-04T10:13:48.4392356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4392438Z with policy(): 2025-12-04T10:13:48.4392890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4392982Z raise RuntimeError(msg) 2025-12-04T10:13:48.4394145Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4394162Z 2025-12-04T10:13:48.4394348Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4395057Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4395063Z 2025-12-04T10:13:48.4395299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4395304Z 2025-12-04T10:13:48.4395443Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.4395555Z Traceback (most recent call last): 2025-12-04T10:13:48.4396034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4396129Z getattr(self, test_name)() 2025-12-04T10:13:48.4396605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4396726Z fn() 2025-12-04T10:13:48.4397171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4397268Z method(*args, **kwargs) 2025-12-04T10:13:48.4397707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4397801Z method(*args, **kwargs) 2025-12-04T10:13:48.4398237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4398317Z with policy(): 2025-12-04T10:13:48.4398763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4398854Z raise RuntimeError(msg) 2025-12-04T10:13:48.4400014Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T10:13:48.4400047Z 2025-12-04T10:13:48.4400234Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4400914Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4400924Z 2025-12-04T10:13:48.4401153Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4401185Z 2025-12-04T10:13:48.4401189Z 2025-12-04T10:13:48.4401384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.4401620Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.4402323Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-44c122860a547cf4.xml - 2025-12-04T10:13:48.4402477Z =========================== short test summary info ============================ 2025-12-04T10:13:48.4403302Z FAILED [48.7966s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.4403404Z Traceback (most recent call last): 2025-12-04T10:13:48.4403889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4403985Z getattr(self, test_name)() 2025-12-04T10:13:48.4404456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4404534Z fn() 2025-12-04T10:13:48.4404978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4405074Z method(*args, **kwargs) 2025-12-04T10:13:48.4405539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4405629Z method(*args, **kwargs) 2025-12-04T10:13:48.4406071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4406153Z with policy(): 2025-12-04T10:13:48.4406733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4406900Z raise RuntimeError(msg) 2025-12-04T10:13:48.4409008Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T10:13:48.4409022Z 2025-12-04T10:13:48.4409236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4409967Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4409973Z 2025-12-04T10:13:48.4410224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4410228Z 2025-12-04T10:13:48.4410379Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4410489Z Traceback (most recent call last): 2025-12-04T10:13:48.4411011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4411110Z getattr(self, test_name)() 2025-12-04T10:13:48.4411616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4411728Z fn() 2025-12-04T10:13:48.4412201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4412301Z method(*args, **kwargs) 2025-12-04T10:13:48.4412770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4412861Z method(*args, **kwargs) 2025-12-04T10:13:48.4413434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4413563Z with policy(): 2025-12-04T10:13:48.4414234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4414339Z raise RuntimeError(msg) 2025-12-04T10:13:48.4415658Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4415667Z 2025-12-04T10:13:48.4415883Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4416647Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4416652Z 2025-12-04T10:13:48.4416918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4416926Z 2025-12-04T10:13:48.4417083Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.4417199Z Traceback (most recent call last): 2025-12-04T10:13:48.4417744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4417858Z getattr(self, test_name)() 2025-12-04T10:13:48.4418396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4418515Z fn() 2025-12-04T10:13:48.4419016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4419123Z method(*args, **kwargs) 2025-12-04T10:13:48.4419620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4419723Z method(*args, **kwargs) 2025-12-04T10:13:48.4420225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4420319Z with policy(): 2025-12-04T10:13:48.4420928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4421036Z raise RuntimeError(msg) 2025-12-04T10:13:48.4422342Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T10:13:48.4422356Z 2025-12-04T10:13:48.4424689Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4425586Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4425595Z 2025-12-04T10:13:48.4425838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4425993Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.4426191Z ====================== 1 failed, 32 deselected in 49.01s ======================= 2025-12-04T10:13:48.4426280Z Got exit code 1 2025-12-04T10:13:48.4426904Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T10:13:48.4427261Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.4427831Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-668210e8e09c8dd9.xml 2025-12-04T10:13:48.4427971Z ============================= test session starts ============================== 2025-12-04T10:13:48.4428313Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.4428406Z cachedir: .pytest_cache 2025-12-04T10:13:48.4428858Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.4428970Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.4429060Z configfile: pytest.ini 2025-12-04T10:13:48.4429535Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.4429727Z collecting ... collected 60 items / 19 deselected / 41 selected 2025-12-04T10:13:48.4429852Z stepcurrent: skipping 19 already run items. 2025-12-04T10:13:48.4429961Z Running 14 items in this shard 2025-12-04T10:13:48.4429967Z 2025-12-04T10:13:48.4431009Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 10:01:06.149000 83532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 83584 2025-12-04T10:13:48.4431449Z I1204 10:01:06.150000 83532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 83585 2025-12-04T10:13:48.4431891Z I1204 10:01:06.151000 83532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 83586 2025-12-04T10:13:48.4432371Z I1204 10:01:06.152000 83532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 83587 2025-12-04T10:13:48.4434158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4434248Z _warn_cpu_init() 2025-12-04T10:13:48.4436028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4436116Z _warn_cpu_init() 2025-12-04T10:13:48.4437935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4438022Z _warn_cpu_init() 2025-12-04T10:13:48.4438931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4439049Z _init_core_state( 2025-12-04T10:13:48.4439956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4440046Z _init_core_state( 2025-12-04T10:13:48.4440950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4441058Z _init_core_state( 2025-12-04T10:13:48.4442570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4442715Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4444234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4444378Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4445911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4446054Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4447824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4447908Z _warn_cpu_init() 2025-12-04T10:13:48.4448820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4448911Z _init_core_state( 2025-12-04T10:13:48.4450453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4450601Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4452113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4452288Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4454085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4454250Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4455982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4456140Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4460636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4461066Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4465623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4466111Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4470088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4470456Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4474399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4474764Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4475451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4475546Z return func(*args, **kwargs) 2025-12-04T10:13:48.4476224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4476316Z return func(*args, **kwargs) 2025-12-04T10:13:48.4477011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4477107Z return func(*args, **kwargs) 2025-12-04T10:13:48.4477779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4477877Z return func(*args, **kwargs) 2025-12-04T10:13:48.4478545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4478780Z return func(*args, **kwargs) 2025-12-04T10:13:48.4479681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4479789Z return func(*args, **kwargs) 2025-12-04T10:13:48.4480543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4480724Z return func(*args, **kwargs) 2025-12-04T10:13:48.4481474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4481588Z return func(*args, **kwargs) 2025-12-04T10:13:48.4482572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.4482716Z return func(*args, **kwargs) 2025-12-04T10:13:48.4483179Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4483709Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4484712Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4485268Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4486251Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4486644Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4487599Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4488091Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4489043Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4489531Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4490519Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4490961Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4491959Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4492387Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4494258Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4494622Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4495323Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4496591Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4496957Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4497703Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4498243Z [rank1]:E1204 10:01:32.141000 83585 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.4498702Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4499224Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4500250Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4500757Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4501742Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4502130Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4503084Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4503573Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4504524Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4505043Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4506186Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4506581Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4507426Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4507856Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4509478Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 711917568 and is now 10532880384. 2025-12-04T10:13:48.4509796Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4510382Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4511504Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4511857Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4512488Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4512967Z [rank0]:E1204 10:01:32.141000 83584 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.4513391Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4513860Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4514746Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4515193Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4516069Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4516418Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4517272Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4517702Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4518567Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4518997Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4519851Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4520246Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4521099Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4521528Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4523149Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4523469Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4524077Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4525205Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4525528Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4526155Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4526659Z [rank2]:E1204 10:01:32.142000 83586 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.4527061Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4527526Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4528403Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4528851Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4529726Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4530073Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4530944Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4531381Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4532223Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4532656Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4533554Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4534162Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4535115Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4535636Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4537445Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4537850Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4538513Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4539776Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4540172Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4540882Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4541432Z [rank3]:E1204 10:01:32.143000 83587 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.4541533Z dist init r=0, world=4 2025-12-04T10:13:48.4541627Z dist init r=1, world=4 2025-12-04T10:13:48.4541730Z dist init r=3, world=4 2025-12-04T10:13:48.4541823Z dist init r=2, world=4 2025-12-04T10:13:48.4542981Z [rank1]:[W1204 10:01:32.649150277 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4544134Z [rank0]:[W1204 10:01:32.650527694 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4545294Z [rank2]:[W1204 10:01:32.654697610 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4546483Z [rank3]:[W1204 10:01:32.656305451 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4546570Z FAILED [47.2503s] [ 7%] 2025-12-04T10:13:48.4546578Z 2025-12-04T10:13:48.4546713Z =================================== FAILURES =================================== 2025-12-04T10:13:48.4547099Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.4547203Z Traceback (most recent call last): 2025-12-04T10:13:48.4547690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.4547788Z self._join_processes(fn) 2025-12-04T10:13:48.4548310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.4548433Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.4548999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.4549107Z raise RuntimeError(error) 2025-12-04T10:13:48.4549315Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4549418Z Traceback (most recent call last): 2025-12-04T10:13:48.4549901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4550023Z getattr(self, test_name)() 2025-12-04T10:13:48.4550496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4550573Z fn() 2025-12-04T10:13:48.4551019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4551116Z method(*args, **kwargs) 2025-12-04T10:13:48.4551560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4551651Z method(*args, **kwargs) 2025-12-04T10:13:48.4552095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4552204Z with policy(): 2025-12-04T10:13:48.4552655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4552752Z raise RuntimeError(msg) 2025-12-04T10:13:48.4553940Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4553953Z 2025-12-04T10:13:48.4554140Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4554868Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4554876Z 2025-12-04T10:13:48.4555118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4555122Z 2025-12-04T10:13:48.4555126Z 2025-12-04T10:13:48.4555318Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.4555557Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.4556285Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-668210e8e09c8dd9.xml - 2025-12-04T10:13:48.4556431Z =========================== short test summary info ============================ 2025-12-04T10:13:48.4557302Z FAILED [47.2503s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4557407Z Traceback (most recent call last): 2025-12-04T10:13:48.4557894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4557987Z getattr(self, test_name)() 2025-12-04T10:13:48.4558462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4558549Z fn() 2025-12-04T10:13:48.4558994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4559086Z method(*args, **kwargs) 2025-12-04T10:13:48.4559529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4559616Z method(*args, **kwargs) 2025-12-04T10:13:48.4560088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4560172Z with policy(): 2025-12-04T10:13:48.4560614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4560711Z raise RuntimeError(msg) 2025-12-04T10:13:48.4561904Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4561935Z 2025-12-04T10:13:48.4562124Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4562848Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4562853Z 2025-12-04T10:13:48.4563107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4563264Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.4563419Z ====================== 1 failed, 19 deselected in 47.47s ======================= 2025-12-04T10:13:48.4563509Z Got exit code 1 2025-12-04T10:13:48.4563597Z Retrying single test... 2025-12-04T10:13:48.4564142Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9448ec7a0a61b5a6.xml 2025-12-04T10:13:48.4564289Z ============================= test session starts ============================== 2025-12-04T10:13:48.4564591Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.4564681Z cachedir: .pytest_cache 2025-12-04T10:13:48.4565144Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.4565249Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.4565344Z configfile: pytest.ini 2025-12-04T10:13:48.4565810Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.4565997Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.4566828Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4566926Z Running 1 items in this shard 2025-12-04T10:13:48.4566931Z 2025-12-04T10:13:48.4567965Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 10:01:58.179000 84637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 84689 2025-12-04T10:13:48.4568402Z I1204 10:01:58.180000 84637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 84690 2025-12-04T10:13:48.4568837Z I1204 10:01:58.181000 84637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 84691 2025-12-04T10:13:48.4569271Z I1204 10:01:58.182000 84637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 84692 2025-12-04T10:13:48.4571093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4571184Z _warn_cpu_init() 2025-12-04T10:13:48.4572953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4573073Z _warn_cpu_init() 2025-12-04T10:13:48.4575240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4575375Z _warn_cpu_init() 2025-12-04T10:13:48.4576407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4576502Z _init_core_state( 2025-12-04T10:13:48.4577536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4577632Z _init_core_state( 2025-12-04T10:13:48.4578863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4578961Z _init_core_state( 2025-12-04T10:13:48.4580666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4580838Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4582609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4582783Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4584475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4584645Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4586702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4586805Z _warn_cpu_init() 2025-12-04T10:13:48.4587841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4587941Z _init_core_state( 2025-12-04T10:13:48.4589636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4589835Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4591538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4591712Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4593221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4593365Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4594872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4595012Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4598985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4599334Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4603334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4603710Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4607939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4608337Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4612563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4612933Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4613883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4614000Z return func(*args, **kwargs) 2025-12-04T10:13:48.4614757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4614873Z return func(*args, **kwargs) 2025-12-04T10:13:48.4615636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4615742Z return func(*args, **kwargs) 2025-12-04T10:13:48.4616502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4616605Z return func(*args, **kwargs) 2025-12-04T10:13:48.4617398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4617504Z return func(*args, **kwargs) 2025-12-04T10:13:48.4618251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4618390Z return func(*args, **kwargs) 2025-12-04T10:13:48.4619146Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4619251Z return func(*args, **kwargs) 2025-12-04T10:13:48.4619998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4620099Z return func(*args, **kwargs) 2025-12-04T10:13:48.4621096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.4621226Z return func(*args, **kwargs) 2025-12-04T10:13:48.4621685Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4622351Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4624166Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4625098Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4626999Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4627650Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4629171Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4629969Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4631473Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4632464Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4634340Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4635094Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4636832Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4637855Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4641261Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384. 2025-12-04T10:13:48.4642022Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4643212Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4645522Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4646335Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4647595Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4648688Z [rank0]:E1204 10:02:24.642000 84689 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.4649484Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4650507Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4652183Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4653050Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4655126Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4656038Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4657799Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4658755Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4659974Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4660465Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4661429Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4661875Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4662930Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4663419Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4665218Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4665727Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4666363Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4667593Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4668241Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4668940Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4669462Z [rank1]:E1204 10:02:24.642000 84690 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.4669906Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4670418Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4671386Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4671882Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4672863Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4673255Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4674181Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4674757Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4675649Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4676107Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4677060Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4677475Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4678376Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4679223Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4681036Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4681396Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4682117Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4683390Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4683756Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4684479Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4685026Z [rank2]:E1204 10:02:24.643000 84691 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.4685481Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4686008Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4687045Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4687554Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4688537Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4688940Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4689886Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4690377Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4691418Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4691903Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4692802Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4693286Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4694451Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4694940Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4696750Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4697143Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4697804Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4699078Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4699439Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4700161Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4700702Z [rank3]:E1204 10:02:24.644000 84692 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.4700811Z dist init r=1, world=4 2025-12-04T10:13:48.4700906Z dist init r=2, world=4 2025-12-04T10:13:48.4700996Z dist init r=3, world=4 2025-12-04T10:13:48.4701093Z dist init r=0, world=4 2025-12-04T10:13:48.4702278Z [rank1]:[W1204 10:02:25.151205396 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4703431Z [rank2]:[W1204 10:02:25.153334182 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4704562Z [rank0]:[W1204 10:02:25.155066962 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4705786Z [rank3]:[W1204 10:02:25.156516753 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4705884Z FAILED [48.8004s] [100%] 2025-12-04T10:13:48.4705891Z 2025-12-04T10:13:48.4706016Z =================================== FAILURES =================================== 2025-12-04T10:13:48.4706432Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.4706540Z Traceback (most recent call last): 2025-12-04T10:13:48.4707023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.4707123Z self._join_processes(fn) 2025-12-04T10:13:48.4707634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.4707790Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.4708326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.4708425Z raise RuntimeError(error) 2025-12-04T10:13:48.4708637Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4708741Z Traceback (most recent call last): 2025-12-04T10:13:48.4709215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4709360Z getattr(self, test_name)() 2025-12-04T10:13:48.4709827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4709907Z fn() 2025-12-04T10:13:48.4710355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4710445Z method(*args, **kwargs) 2025-12-04T10:13:48.4710896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4710986Z method(*args, **kwargs) 2025-12-04T10:13:48.4711425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4711513Z with policy(): 2025-12-04T10:13:48.4711960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4712060Z raise RuntimeError(msg) 2025-12-04T10:13:48.4713252Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4713261Z 2025-12-04T10:13:48.4713486Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4714211Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4714217Z 2025-12-04T10:13:48.4714453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4714458Z 2025-12-04T10:13:48.4714610Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.4714716Z Traceback (most recent call last): 2025-12-04T10:13:48.4715198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4715291Z getattr(self, test_name)() 2025-12-04T10:13:48.4715764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4715844Z fn() 2025-12-04T10:13:48.4716286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4716375Z method(*args, **kwargs) 2025-12-04T10:13:48.4716823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4716937Z method(*args, **kwargs) 2025-12-04T10:13:48.4717383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4717471Z with policy(): 2025-12-04T10:13:48.4717914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4718015Z raise RuntimeError(msg) 2025-12-04T10:13:48.4719236Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4719242Z 2025-12-04T10:13:48.4719435Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4720156Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4720188Z 2025-12-04T10:13:48.4720418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4720428Z 2025-12-04T10:13:48.4720570Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.4720671Z Traceback (most recent call last): 2025-12-04T10:13:48.4721156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4721249Z getattr(self, test_name)() 2025-12-04T10:13:48.4721718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4721799Z fn() 2025-12-04T10:13:48.4722241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4722337Z method(*args, **kwargs) 2025-12-04T10:13:48.4722775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4722865Z method(*args, **kwargs) 2025-12-04T10:13:48.4723314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4723397Z with policy(): 2025-12-04T10:13:48.4723843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4723945Z raise RuntimeError(msg) 2025-12-04T10:13:48.4725160Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4725167Z 2025-12-04T10:13:48.4725357Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4726256Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4726261Z 2025-12-04T10:13:48.4726678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4726686Z 2025-12-04T10:13:48.4726690Z 2025-12-04T10:13:48.4726901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.4727190Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.4728538Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9448ec7a0a61b5a6.xml - 2025-12-04T10:13:48.4728869Z =========================== short test summary info ============================ 2025-12-04T10:13:48.4730053Z FAILED [48.8004s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.4730179Z Traceback (most recent call last): 2025-12-04T10:13:48.4730713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4730873Z getattr(self, test_name)() 2025-12-04T10:13:48.4731384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4731474Z fn() 2025-12-04T10:13:48.4731958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4732063Z method(*args, **kwargs) 2025-12-04T10:13:48.4732559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4732654Z method(*args, **kwargs) 2025-12-04T10:13:48.4733168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4733360Z with policy(): 2025-12-04T10:13:48.4734024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4734137Z raise RuntimeError(msg) 2025-12-04T10:13:48.4735484Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4735492Z 2025-12-04T10:13:48.4735703Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4736526Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4736535Z 2025-12-04T10:13:48.4736795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4736800Z 2025-12-04T10:13:48.4736966Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.4737082Z Traceback (most recent call last): 2025-12-04T10:13:48.4737624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4737777Z getattr(self, test_name)() 2025-12-04T10:13:48.4738308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4738401Z fn() 2025-12-04T10:13:48.4738900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4738998Z method(*args, **kwargs) 2025-12-04T10:13:48.4739501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4739599Z method(*args, **kwargs) 2025-12-04T10:13:48.4740093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4740193Z with policy(): 2025-12-04T10:13:48.4740694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4740804Z raise RuntimeError(msg) 2025-12-04T10:13:48.4742187Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4742193Z 2025-12-04T10:13:48.4742410Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4743224Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4743230Z 2025-12-04T10:13:48.4743517Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4743523Z 2025-12-04T10:13:48.4743685Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.4743802Z Traceback (most recent call last): 2025-12-04T10:13:48.4744349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4744454Z getattr(self, test_name)() 2025-12-04T10:13:48.4745100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4745187Z fn() 2025-12-04T10:13:48.4745800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4745889Z method(*args, **kwargs) 2025-12-04T10:13:48.4746337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4746427Z method(*args, **kwargs) 2025-12-04T10:13:48.4746869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4746954Z with policy(): 2025-12-04T10:13:48.4747401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4747497Z raise RuntimeError(msg) 2025-12-04T10:13:48.4748682Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4748689Z 2025-12-04T10:13:48.4748878Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4749597Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4749603Z 2025-12-04T10:13:48.4749861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4750028Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.4750183Z ====================== 1 failed, 32 deselected in 49.02s ======================= 2025-12-04T10:13:48.4750272Z Got exit code 1 2025-12-04T10:13:48.4750362Z Retrying single test... 2025-12-04T10:13:48.4750911Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8d6fd75ad2c1f260.xml 2025-12-04T10:13:48.4751058Z ============================= test session starts ============================== 2025-12-04T10:13:48.4751361Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.4751454Z cachedir: .pytest_cache 2025-12-04T10:13:48.4751912Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.4752021Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.4752118Z configfile: pytest.ini 2025-12-04T10:13:48.4752589Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.4752807Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.4753607Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4753707Z Running 1 items in this shard 2025-12-04T10:13:48.4753712Z 2025-12-04T10:13:48.4754748Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 10:02:51.189000 85742 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 85794 2025-12-04T10:13:48.4755234Z I1204 10:02:51.190000 85742 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 85795 2025-12-04T10:13:48.4755668Z I1204 10:02:51.191000 85742 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 85796 2025-12-04T10:13:48.4756103Z I1204 10:02:51.192000 85742 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 85797 2025-12-04T10:13:48.4757891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4758011Z _warn_cpu_init() 2025-12-04T10:13:48.4759777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4759870Z _warn_cpu_init() 2025-12-04T10:13:48.4761635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4761728Z _warn_cpu_init() 2025-12-04T10:13:48.4762671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4762758Z _init_core_state( 2025-12-04T10:13:48.4763674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4763760Z _init_core_state( 2025-12-04T10:13:48.4764680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4764764Z _init_core_state( 2025-12-04T10:13:48.4766305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4766453Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4767951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4768132Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4769629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4769775Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4771549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4771672Z _warn_cpu_init() 2025-12-04T10:13:48.4772586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T10:13:48.4772674Z _init_core_state( 2025-12-04T10:13:48.4774512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4774677Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4776424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4776586Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4778284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4778445Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4780352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.4780512Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.4785075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4785513Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4790016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4790559Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4794702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4795055Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4799018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4799368Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4800075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4800183Z return func(*args, **kwargs) 2025-12-04T10:13:48.4800862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4800964Z return func(*args, **kwargs) 2025-12-04T10:13:48.4801634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4801751Z return func(*args, **kwargs) 2025-12-04T10:13:48.4802429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4802524Z return func(*args, **kwargs) 2025-12-04T10:13:48.4803196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4803285Z return func(*args, **kwargs) 2025-12-04T10:13:48.4803953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4804055Z return func(*args, **kwargs) 2025-12-04T10:13:48.4804720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4804821Z return func(*args, **kwargs) 2025-12-04T10:13:48.4805486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4805577Z return func(*args, **kwargs) 2025-12-04T10:13:48.4806492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.4806584Z return func(*args, **kwargs) 2025-12-04T10:13:48.4806990Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4807470Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4808353Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4808809Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4809683Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4810037Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4810920Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4811352Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4812207Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4812670Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4813582Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4814186Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4815181Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4815671Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4817473Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384. 2025-12-04T10:13:48.4817834Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4818491Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4819772Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4820166Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4820886Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4821426Z [rank0]:E1204 10:03:17.875000 85794 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.4821877Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4822408Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4823398Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4823901Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4824912Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4825313Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4826289Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4826743Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4827589Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4828017Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4828868Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4829286Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4830141Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4830569Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4832164Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T10:13:48.4832484Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4833062Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4834217Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4834537Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4835172Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4835651Z [rank1]:E1204 10:03:17.875000 85795 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.4836053Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4836521Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4837404Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4837877Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4838752Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4839102Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4839971Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4840402Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4841244Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4841696Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4842540Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4842932Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4843787Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4844217Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4845819Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T10:13:48.4846165Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4846746Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4847876Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4848195Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4848829Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4849311Z [rank2]:E1204 10:03:17.876000 85796 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.4849713Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4850202Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4851088Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4851538Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4852438Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4852793Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4853875Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4854360Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4855345Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4855824Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4856783Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4857225Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4858184Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4858667Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4860504Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T10:13:48.4860863Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4861516Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4862786Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4863141Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4863851Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4864386Z [rank3]:E1204 10:03:17.876000 85797 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.4864517Z dist init r=2, world=4 2025-12-04T10:13:48.4864612Z dist init r=1, world=4 2025-12-04T10:13:48.4864705Z dist init r=0, world=4 2025-12-04T10:13:48.4864805Z dist init r=3, world=4 2025-12-04T10:13:48.4866153Z [rank2]:[W1204 10:03:18.387179651 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4867312Z [rank1]:[W1204 10:03:18.387305049 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4868573Z [rank0]:[W1204 10:03:18.390605232 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4869666Z [rank3]:[W1204 10:03:18.394082856 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.4869794Z FAILED [48.5677s] [100%] 2025-12-04T10:13:48.4869801Z 2025-12-04T10:13:48.4869940Z =================================== FAILURES =================================== 2025-12-04T10:13:48.4870362Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.4870478Z Traceback (most recent call last): 2025-12-04T10:13:48.4871001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.4871110Z self._join_processes(fn) 2025-12-04T10:13:48.4871676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.4871815Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.4872395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.4872502Z raise RuntimeError(error) 2025-12-04T10:13:48.4872729Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.4872949Z Traceback (most recent call last): 2025-12-04T10:13:48.4873476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4873583Z getattr(self, test_name)() 2025-12-04T10:13:48.4874074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4874158Z fn() 2025-12-04T10:13:48.4874629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4874725Z method(*args, **kwargs) 2025-12-04T10:13:48.4875193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4875285Z method(*args, **kwargs) 2025-12-04T10:13:48.4875746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4875841Z with policy(): 2025-12-04T10:13:48.4876314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4876417Z raise RuntimeError(msg) 2025-12-04T10:13:48.4877711Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384. 2025-12-04T10:13:48.4877720Z 2025-12-04T10:13:48.4877918Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4878843Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4878886Z 2025-12-04T10:13:48.4879309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4879315Z 2025-12-04T10:13:48.4879319Z 2025-12-04T10:13:48.4879544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.4879809Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.4880618Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8d6fd75ad2c1f260.xml - 2025-12-04T10:13:48.4880784Z =========================== short test summary info ============================ 2025-12-04T10:13:48.4881804Z FAILED [48.5677s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.4881929Z Traceback (most recent call last): 2025-12-04T10:13:48.4882480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4882596Z getattr(self, test_name)() 2025-12-04T10:13:48.4883130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4883215Z fn() 2025-12-04T10:13:48.4883722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4883822Z method(*args, **kwargs) 2025-12-04T10:13:48.4884319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4884427Z method(*args, **kwargs) 2025-12-04T10:13:48.4884921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4885018Z with policy(): 2025-12-04T10:13:48.4885517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4885621Z raise RuntimeError(msg) 2025-12-04T10:13:48.4887017Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384. 2025-12-04T10:13:48.4887023Z 2025-12-04T10:13:48.4887236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4888057Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4888063Z 2025-12-04T10:13:48.4888321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4888497Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.4888673Z ====================== 1 failed, 32 deselected in 48.79s ======================= 2025-12-04T10:13:48.4888768Z Got exit code 1 2025-12-04T10:13:48.4889499Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.4889949Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.4890569Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa2eb835ecdd4375.xml 2025-12-04T10:13:48.4890734Z ============================= test session starts ============================== 2025-12-04T10:13:48.4891181Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.4891324Z cachedir: .pytest_cache 2025-12-04T10:13:48.4891896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.4892003Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.4892097Z configfile: pytest.ini 2025-12-04T10:13:48.4892569Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.4892758Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T10:13:48.4892883Z stepcurrent: skipping 20 already run items. 2025-12-04T10:13:48.4893005Z Running 13 items in this shard 2025-12-04T10:13:48.4893010Z 2025-12-04T10:13:48.4894241Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda I1204 10:03:44.739000 86847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 86899 2025-12-04T10:13:48.4894739Z I1204 10:03:44.740000 86847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 86900 2025-12-04T10:13:48.4895227Z I1204 10:03:44.741000 86847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 86901 2025-12-04T10:13:48.4895713Z I1204 10:03:44.742000 86847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 86902 2025-12-04T10:13:48.4896688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4896819Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.4897784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4897910Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.4898905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4899023Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.4901044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4901140Z _warn_cpu_init() 2025-12-04T10:13:48.4903140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4903235Z _warn_cpu_init() 2025-12-04T10:13:48.4905254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4905378Z _warn_cpu_init() 2025-12-04T10:13:48.4906499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4906606Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.4908373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.4908500Z _warn_cpu_init() 2025-12-04T10:13:48.4909372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4909603Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.4910477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4910705Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.4911578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4911800Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.4912680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.4912932Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.4913618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4913713Z return func(*args, **kwargs) 2025-12-04T10:13:48.4914389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4914488Z return func(*args, **kwargs) 2025-12-04T10:13:48.4915160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4915254Z return func(*args, **kwargs) 2025-12-04T10:13:48.4920087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.4920200Z return func(*args, **kwargs) 2025-12-04T10:13:48.4920880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4921042Z return func(*args, **kwargs) 2025-12-04T10:13:48.4921715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4921815Z return func(*args, **kwargs) 2025-12-04T10:13:48.4922483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4922614Z return func(*args, **kwargs) 2025-12-04T10:13:48.4923281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.4923372Z return func(*args, **kwargs) 2025-12-04T10:13:48.4924255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.4924346Z return func(*args, **kwargs) 2025-12-04T10:13:48.4928366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4928714Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4932708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4933058Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4940951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4941770Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4950106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.4950962Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.4951803Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4952790Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4954685Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4955654Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4957628Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4958386Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4959921Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4960702Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4962309Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4963119Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4964764Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4965470Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4967205Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4968067Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4970889Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 3. CUDA driver allocated memory was 611254272 and is now 674168832. 2025-12-04T10:13:48.4971481Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4972268Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4973525Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.4974050Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4974768Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4975310Z [rank3]:E1204 10:03:52.661000 86902 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.4975760Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4976289Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4977284Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4977788Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4979043Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4979446Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4980400Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4980883Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4981837Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4982319Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4983315Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4983755Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4984724Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4985268Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.4986960Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 0. CUDA driver allocated memory was 714014720 and is now 783220736. 2025-12-04T10:13:48.4987321Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4988012Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.4989175Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.4989535Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.4990249Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.4990987Z [rank0]:E1204 10:03:52.661000 86899 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.4991382Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.4991848Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.4992724Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.4993201Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.4994069Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.4994422Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.4995275Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4995701Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4996549Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.4996999Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.4997848Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.4998237Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.4999113Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.4999541Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5001044Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 2. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:48.5001394Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5001968Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5003012Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5003328Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5003961Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5004440Z [rank2]:E1204 10:03:52.662000 86901 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5004834Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5005302Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5006203Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5006652Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5007524Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5007876Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5008724Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5009149Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5010020Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5010447Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5011297Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5011712Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5012572Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5013000Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5014836Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:48.5015237Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5015887Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5017056Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5017412Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5018131Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5018669Z [rank1]:E1204 10:03:52.662000 86900 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5018767Z dist init r=3, world=4 2025-12-04T10:13:48.5018866Z dist init r=0, world=4 2025-12-04T10:13:48.5018986Z dist init r=1, world=4 2025-12-04T10:13:48.5019079Z dist init r=2, world=4 2025-12-04T10:13:48.5020235Z [rank0]:[W1204 10:03:53.175915187 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5020335Z FAILED [10.5039s] [ 7%] 2025-12-04T10:13:48.5020346Z 2025-12-04T10:13:48.5020490Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5020817Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda _ 2025-12-04T10:13:48.5020935Z Traceback (most recent call last): 2025-12-04T10:13:48.5021482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5021589Z self._join_processes(fn) 2025-12-04T10:13:48.5022176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5022311Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5022944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5023057Z raise RuntimeError(error) 2025-12-04T10:13:48.5023288Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5023401Z Traceback (most recent call last): 2025-12-04T10:13:48.5023936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5024073Z getattr(self, test_name)() 2025-12-04T10:13:48.5024602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5024686Z fn() 2025-12-04T10:13:48.5025187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5025293Z method(*args, **kwargs) 2025-12-04T10:13:48.5025788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5025998Z method(*args, **kwargs) 2025-12-04T10:13:48.5026439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5026549Z with policy(): 2025-12-04T10:13:48.5026997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5027092Z raise RuntimeError(msg) 2025-12-04T10:13:48.5028195Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:48.5028207Z 2025-12-04T10:13:48.5028395Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5029026Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5029033Z 2025-12-04T10:13:48.5029270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5029275Z 2025-12-04T10:13:48.5029414Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5029522Z Traceback (most recent call last): 2025-12-04T10:13:48.5030003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5030096Z getattr(self, test_name)() 2025-12-04T10:13:48.5030592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5030667Z fn() 2025-12-04T10:13:48.5031108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5031201Z method(*args, **kwargs) 2025-12-04T10:13:48.5031637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5031729Z method(*args, **kwargs) 2025-12-04T10:13:48.5032168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5032249Z with policy(): 2025-12-04T10:13:48.5032700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5032793Z raise RuntimeError(msg) 2025-12-04T10:13:48.5033893Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 3. CUDA driver allocated memory was 611254272 and is now 674168832. 2025-12-04T10:13:48.5033902Z 2025-12-04T10:13:48.5034128Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5034755Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5034762Z 2025-12-04T10:13:48.5034996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5035001Z 2025-12-04T10:13:48.5035315Z 2025-12-04T10:13:48.5035513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5035746Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5036454Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa2eb835ecdd4375.xml - 2025-12-04T10:13:48.5036599Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5037385Z FAILED [10.5039s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5037517Z Traceback (most recent call last): 2025-12-04T10:13:48.5037999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5038094Z getattr(self, test_name)() 2025-12-04T10:13:48.5038561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5038641Z fn() 2025-12-04T10:13:48.5039086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5039178Z method(*args, **kwargs) 2025-12-04T10:13:48.5039621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5039707Z method(*args, **kwargs) 2025-12-04T10:13:48.5040148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5040230Z with policy(): 2025-12-04T10:13:48.5040674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5040773Z raise RuntimeError(msg) 2025-12-04T10:13:48.5041897Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:48.5041903Z 2025-12-04T10:13:48.5042093Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5042726Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5042733Z 2025-12-04T10:13:48.5042966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5042970Z 2025-12-04T10:13:48.5043109Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5043213Z Traceback (most recent call last): 2025-12-04T10:13:48.5043695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5043789Z getattr(self, test_name)() 2025-12-04T10:13:48.5044256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5044336Z fn() 2025-12-04T10:13:48.5044778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5044895Z method(*args, **kwargs) 2025-12-04T10:13:48.5045335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5045424Z method(*args, **kwargs) 2025-12-04T10:13:48.5045865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5045945Z with policy(): 2025-12-04T10:13:48.5046412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5046507Z raise RuntimeError(msg) 2025-12-04T10:13:48.5047599Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 3. CUDA driver allocated memory was 611254272 and is now 674168832. 2025-12-04T10:13:48.5047605Z 2025-12-04T10:13:48.5047796Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5048425Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5048457Z 2025-12-04T10:13:48.5048688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5048841Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5048999Z ====================== 1 failed, 20 deselected in 10.72s ======================= 2025-12-04T10:13:48.5049087Z Got exit code 1 2025-12-04T10:13:48.5049175Z Retrying single test... 2025-12-04T10:13:48.5049724Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e3a192bec2a8308.xml 2025-12-04T10:13:48.5049865Z ============================= test session starts ============================== 2025-12-04T10:13:48.5050168Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5050265Z cachedir: .pytest_cache 2025-12-04T10:13:48.5050717Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5050821Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5050913Z configfile: pytest.ini 2025-12-04T10:13:48.5051386Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5051570Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.5052300Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5052396Z Running 1 items in this shard 2025-12-04T10:13:48.5052401Z 2025-12-04T10:13:48.5053434Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda I1204 10:03:59.699000 87184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 87236 2025-12-04T10:13:48.5054078Z I1204 10:03:59.700000 87184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 87237 2025-12-04T10:13:48.5054569Z I1204 10:03:59.701000 87184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 87238 2025-12-04T10:13:48.5055052Z I1204 10:03:59.702000 87184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 87239 2025-12-04T10:13:48.5056033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5056160Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5057162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5057289Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5058246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5058394Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5059355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5059471Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5061471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5061595Z _warn_cpu_init() 2025-12-04T10:13:48.5063605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5063702Z _warn_cpu_init() 2025-12-04T10:13:48.5065803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5065890Z _warn_cpu_init() 2025-12-04T10:13:48.5067670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5067761Z _warn_cpu_init() 2025-12-04T10:13:48.5068635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5068867Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5069738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5069968Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5070866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5071090Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5071962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5072182Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5072902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5072999Z return func(*args, **kwargs) 2025-12-04T10:13:48.5074043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5074217Z return func(*args, **kwargs) 2025-12-04T10:13:48.5075506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5075694Z return func(*args, **kwargs) 2025-12-04T10:13:48.5076408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5076508Z return func(*args, **kwargs) 2025-12-04T10:13:48.5077218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5077315Z return func(*args, **kwargs) 2025-12-04T10:13:48.5078031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5078127Z return func(*args, **kwargs) 2025-12-04T10:13:48.5079162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5079276Z return func(*args, **kwargs) 2025-12-04T10:13:48.5080021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5080130Z return func(*args, **kwargs) 2025-12-04T10:13:48.5081204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5081308Z return func(*args, **kwargs) 2025-12-04T10:13:48.5085789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5086230Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5090692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5091125Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5095713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5096106Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5100612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5100998Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5101458Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5101989Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5102984Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5103582Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5104574Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5105004Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5106154Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5106588Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5107433Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5107890Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5108738Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5109133Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5109981Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5110411Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5111920Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T10:13:48.5112241Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5112853Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5113881Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5114201Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5114840Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5115315Z [rank0]:E1204 10:04:07.588000 87236 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5115717Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5116180Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5117084Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5117537Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5118407Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5118791Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5119634Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5120064Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5120934Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5121360Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5122206Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5122593Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5123449Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5123875Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5125402Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 1. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T10:13:48.5125721Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5126300Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5127337Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5127652Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5128293Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5128768Z [rank1]:E1204 10:04:07.589000 87237 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5129201Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5129666Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5130543Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5131021Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5131887Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5132240Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5133084Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5133780Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5134729Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5135212Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5136167Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5136605Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5137564Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5138047Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5139786Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T10:13:48.5140145Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5140794Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5141965Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5142325Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5143040Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5143605Z [rank2]:E1204 10:04:07.589000 87238 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5144057Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5144580Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5145706Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5146287Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5147157Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5147508Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5148380Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5148811Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5149655Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5150081Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5150929Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5151323Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5152175Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5152628Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5154131Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 3. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:48.5154450Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5155024Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5156057Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5156371Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5157030Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5157512Z [rank3]:E1204 10:04:07.591000 87239 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5157601Z dist init r=0, world=4 2025-12-04T10:13:48.5157682Z dist init r=2, world=4 2025-12-04T10:13:48.5157787Z dist init r=3, world=4 2025-12-04T10:13:48.5157873Z dist init r=1, world=4 2025-12-04T10:13:48.5158898Z [rank0]:[W1204 10:04:07.095960991 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5158983Z FAILED [10.2096s] [100%] 2025-12-04T10:13:48.5158990Z 2025-12-04T10:13:48.5159121Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5159411Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda _ 2025-12-04T10:13:48.5159544Z Traceback (most recent call last): 2025-12-04T10:13:48.5160019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5160120Z self._join_processes(fn) 2025-12-04T10:13:48.5160638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5160759Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5161294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5161390Z raise RuntimeError(error) 2025-12-04T10:13:48.5161597Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5161708Z Traceback (most recent call last): 2025-12-04T10:13:48.5162180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5162278Z getattr(self, test_name)() 2025-12-04T10:13:48.5162749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5162825Z fn() 2025-12-04T10:13:48.5163272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5163360Z method(*args, **kwargs) 2025-12-04T10:13:48.5163827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5163920Z method(*args, **kwargs) 2025-12-04T10:13:48.5164359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5164443Z with policy(): 2025-12-04T10:13:48.5164887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5164982Z raise RuntimeError(msg) 2025-12-04T10:13:48.5166085Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T10:13:48.5166093Z 2025-12-04T10:13:48.5166282Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5166913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5166924Z 2025-12-04T10:13:48.5167180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5167186Z 2025-12-04T10:13:48.5167190Z 2025-12-04T10:13:48.5167381Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5167618Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5168317Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e3a192bec2a8308.xml - 2025-12-04T10:13:48.5168494Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5169276Z FAILED [10.2096s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5169378Z Traceback (most recent call last): 2025-12-04T10:13:48.5169866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5169962Z getattr(self, test_name)() 2025-12-04T10:13:48.5170435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5170549Z fn() 2025-12-04T10:13:48.5170995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5171090Z method(*args, **kwargs) 2025-12-04T10:13:48.5171528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5171616Z method(*args, **kwargs) 2025-12-04T10:13:48.5172063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5172144Z with policy(): 2025-12-04T10:13:48.5172595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5172686Z raise RuntimeError(msg) 2025-12-04T10:13:48.5174040Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T10:13:48.5174050Z 2025-12-04T10:13:48.5174270Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5175023Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5175029Z 2025-12-04T10:13:48.5175292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5175462Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5175637Z ====================== 1 failed, 32 deselected in 10.43s ======================= 2025-12-04T10:13:48.5175734Z Got exit code 1 2025-12-04T10:13:48.5175834Z Retrying single test... 2025-12-04T10:13:48.5176459Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c427df5212a82823.xml 2025-12-04T10:13:48.5176614Z ============================= test session starts ============================== 2025-12-04T10:13:48.5176959Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5177064Z cachedir: .pytest_cache 2025-12-04T10:13:48.5177580Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5177695Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5177801Z configfile: pytest.ini 2025-12-04T10:13:48.5178354Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5178570Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.5179556Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5179665Z Running 1 items in this shard 2025-12-04T10:13:48.5179671Z 2025-12-04T10:13:48.5180827Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda I1204 10:04:14.659000 87521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 87573 2025-12-04T10:13:48.5181322Z I1204 10:04:14.660000 87521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 87574 2025-12-04T10:13:48.5181814Z I1204 10:04:14.661000 87521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 87575 2025-12-04T10:13:48.5182300Z I1204 10:04:14.662000 87521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 87576 2025-12-04T10:13:48.5183318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5183447Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5185460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5185558Z _warn_cpu_init() 2025-12-04T10:13:48.5186522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5186647Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5188669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5188770Z _warn_cpu_init() 2025-12-04T10:13:48.5189762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5190019Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5191077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5191316Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5192227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5192337Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5193276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5193395Z return wrapper_cls(module, **kwargs) 2025-12-04T10:13:48.5195278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5195398Z _warn_cpu_init() 2025-12-04T10:13:48.5197263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5197384Z _warn_cpu_init() 2025-12-04T10:13:48.5198306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5198545Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5199473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.5199812Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T10:13:48.5200500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5200595Z return func(*args, **kwargs) 2025-12-04T10:13:48.5201274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5201368Z return func(*args, **kwargs) 2025-12-04T10:13:48.5202038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5202161Z return func(*args, **kwargs) 2025-12-04T10:13:48.5202832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.5202923Z return func(*args, **kwargs) 2025-12-04T10:13:48.5203596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5203690Z return func(*args, **kwargs) 2025-12-04T10:13:48.5204366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5204459Z return func(*args, **kwargs) 2025-12-04T10:13:48.5205122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5205216Z return func(*args, **kwargs) 2025-12-04T10:13:48.5205880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.5206004Z return func(*args, **kwargs) 2025-12-04T10:13:48.5206882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5206976Z return func(*args, **kwargs) 2025-12-04T10:13:48.5210947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5211351Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5215772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5216177Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5220677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5221073Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5225547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T10:13:48.5226062Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T10:13:48.5226472Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5226942Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5227855Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5228299Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5229180Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5229536Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5230379Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5230811Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5231659Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5232115Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5232960Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5233348Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5234208Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5234639Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5236184Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 0. CUDA driver allocated memory was 716111872 and is now 783220736. 2025-12-04T10:13:48.5236508Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5237096Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5238127Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5238471Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5239106Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5239588Z [rank0]:E1204 10:04:22.678000 87573 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5240018Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5240485Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5241373Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5241817Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5242686Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5243067Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5244546Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5245333Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5246959Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5247782Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5249235Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5249961Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5251487Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5252429Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5256067Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:48.5256768Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5257965Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5260288Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5260996Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5262412Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5263526Z [rank1]:E1204 10:04:22.679000 87574 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5264345Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5265331Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5267228Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5268221Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5269957Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5270650Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5272501Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5273292Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5274907Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5275772Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5277374Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5278103Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5280008Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5280623Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5282325Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 3. CUDA driver allocated memory was 560922624 and is now 674168832. 2025-12-04T10:13:48.5282737Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5283399Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5284566Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5284925Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5285688Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5286233Z [rank3]:E1204 10:04:22.680000 87576 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5286685Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5287209Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5288219Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5288725Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5289708Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5290110Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5291113Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5291697Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5292591Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5293041Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5294223Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5294663Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5295662Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5296149Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5297842Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 600768512 and is now 674168832. 2025-12-04T10:13:48.5298234Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5298894Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5300053Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5300440Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5301159Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5301699Z [rank2]:E1204 10:04:22.687000 87575 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5301801Z dist init r=0, world=4 2025-12-04T10:13:48.5301895Z dist init r=1, world=4 2025-12-04T10:13:48.5301986Z dist init r=3, world=4 2025-12-04T10:13:48.5302084Z dist init r=2, world=4 2025-12-04T10:13:48.5303236Z [rank0]:[W1204 10:04:23.188004154 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5303339Z FAILED [10.7278s] [100%] 2025-12-04T10:13:48.5303347Z 2025-12-04T10:13:48.5303487Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5303821Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda _ 2025-12-04T10:13:48.5303942Z Traceback (most recent call last): 2025-12-04T10:13:48.5304528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5304638Z self._join_processes(fn) 2025-12-04T10:13:48.5305225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5305360Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5306164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5306260Z raise RuntimeError(error) 2025-12-04T10:13:48.5306463Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5306574Z Traceback (most recent call last): 2025-12-04T10:13:48.5307045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5307141Z getattr(self, test_name)() 2025-12-04T10:13:48.5307612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5307686Z fn() 2025-12-04T10:13:48.5308161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5308252Z method(*args, **kwargs) 2025-12-04T10:13:48.5308695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5308787Z method(*args, **kwargs) 2025-12-04T10:13:48.5309228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5309340Z with policy(): 2025-12-04T10:13:48.5309783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5309876Z raise RuntimeError(msg) 2025-12-04T10:13:48.5310982Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 0. CUDA driver allocated memory was 716111872 and is now 783220736. 2025-12-04T10:13:48.5310989Z 2025-12-04T10:13:48.5311177Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5311842Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5311848Z 2025-12-04T10:13:48.5312076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5312083Z 2025-12-04T10:13:48.5312222Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5312328Z Traceback (most recent call last): 2025-12-04T10:13:48.5312807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5312909Z getattr(self, test_name)() 2025-12-04T10:13:48.5313377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5313451Z fn() 2025-12-04T10:13:48.5313898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5313988Z method(*args, **kwargs) 2025-12-04T10:13:48.5314429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5314520Z method(*args, **kwargs) 2025-12-04T10:13:48.5314959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5315043Z with policy(): 2025-12-04T10:13:48.5315520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5315612Z raise RuntimeError(msg) 2025-12-04T10:13:48.5316713Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:48.5316720Z 2025-12-04T10:13:48.5316906Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5317542Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5317549Z 2025-12-04T10:13:48.5317777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5317781Z 2025-12-04T10:13:48.5317924Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5318031Z Traceback (most recent call last): 2025-12-04T10:13:48.5318511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5318635Z getattr(self, test_name)() 2025-12-04T10:13:48.5319102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5319179Z fn() 2025-12-04T10:13:48.5319626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5319716Z method(*args, **kwargs) 2025-12-04T10:13:48.5320180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5320270Z method(*args, **kwargs) 2025-12-04T10:13:48.5320711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5320797Z with policy(): 2025-12-04T10:13:48.5321243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5321337Z raise RuntimeError(msg) 2025-12-04T10:13:48.5322445Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 3. CUDA driver allocated memory was 560922624 and is now 674168832. 2025-12-04T10:13:48.5322476Z 2025-12-04T10:13:48.5322661Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5323292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5323297Z 2025-12-04T10:13:48.5323526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5323531Z 2025-12-04T10:13:48.5323535Z 2025-12-04T10:13:48.5323728Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5323957Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5324660Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c427df5212a82823.xml - 2025-12-04T10:13:48.5324812Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5325587Z FAILED [10.7278s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5325697Z Traceback (most recent call last): 2025-12-04T10:13:48.5326273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5326371Z getattr(self, test_name)() 2025-12-04T10:13:48.5326847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5326925Z fn() 2025-12-04T10:13:48.5327372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5327471Z method(*args, **kwargs) 2025-12-04T10:13:48.5327912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5328007Z method(*args, **kwargs) 2025-12-04T10:13:48.5328451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5328534Z with policy(): 2025-12-04T10:13:48.5328987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5329079Z raise RuntimeError(msg) 2025-12-04T10:13:48.5330215Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 0. CUDA driver allocated memory was 716111872 and is now 783220736. 2025-12-04T10:13:48.5330222Z 2025-12-04T10:13:48.5330410Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5331039Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5331069Z 2025-12-04T10:13:48.5331304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5331308Z 2025-12-04T10:13:48.5331452Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5331560Z Traceback (most recent call last): 2025-12-04T10:13:48.5332039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5332134Z getattr(self, test_name)() 2025-12-04T10:13:48.5332610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5332711Z fn() 2025-12-04T10:13:48.5333161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5333316Z method(*args, **kwargs) 2025-12-04T10:13:48.5333953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5334064Z method(*args, **kwargs) 2025-12-04T10:13:48.5334561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5334653Z with policy(): 2025-12-04T10:13:48.5335159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5335262Z raise RuntimeError(msg) 2025-12-04T10:13:48.5336503Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T10:13:48.5336511Z 2025-12-04T10:13:48.5336719Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5337433Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5337446Z 2025-12-04T10:13:48.5337738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5337744Z 2025-12-04T10:13:48.5337903Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5338025Z Traceback (most recent call last): 2025-12-04T10:13:48.5338570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5338677Z getattr(self, test_name)() 2025-12-04T10:13:48.5339214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5339301Z fn() 2025-12-04T10:13:48.5339807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5339908Z method(*args, **kwargs) 2025-12-04T10:13:48.5340405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5340514Z method(*args, **kwargs) 2025-12-04T10:13:48.5341007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5341097Z with policy(): 2025-12-04T10:13:48.5341626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5341737Z raise RuntimeError(msg) 2025-12-04T10:13:48.5342979Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 3. CUDA driver allocated memory was 560922624 and is now 674168832. 2025-12-04T10:13:48.5343015Z 2025-12-04T10:13:48.5343223Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5343933Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5343944Z 2025-12-04T10:13:48.5344199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5344373Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5344550Z ====================== 1 failed, 32 deselected in 10.94s ======================= 2025-12-04T10:13:48.5344685Z Got exit code 1 2025-12-04T10:13:48.5345316Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda 2025-12-04T10:13:48.5345807Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.5346355Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a459d6ece2e0d396.xml 2025-12-04T10:13:48.5346502Z ============================= test session starts ============================== 2025-12-04T10:13:48.5346805Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5346897Z cachedir: .pytest_cache 2025-12-04T10:13:48.5347354Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5347457Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5347554Z configfile: pytest.ini 2025-12-04T10:13:48.5348019Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5348205Z collecting ... collected 60 items / 21 deselected / 39 selected 2025-12-04T10:13:48.5348335Z stepcurrent: skipping 21 already run items. 2025-12-04T10:13:48.5348430Z Running 12 items in this shard 2025-12-04T10:13:48.5348435Z 2025-12-04T10:13:48.5349438Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda I1204 10:04:29.580000 87858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 87910 2025-12-04T10:13:48.5349884Z I1204 10:04:29.581000 87858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 87911 2025-12-04T10:13:48.5350315Z I1204 10:04:29.581000 87858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 87912 2025-12-04T10:13:48.5350749Z I1204 10:04:29.582000 87858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 87913 2025-12-04T10:13:48.5352541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5352634Z _warn_cpu_init() 2025-12-04T10:13:48.5354449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5354543Z _warn_cpu_init() 2025-12-04T10:13:48.5356345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5356433Z _warn_cpu_init() 2025-12-04T10:13:48.5358202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5358311Z _warn_cpu_init() 2025-12-04T10:13:48.5359189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5359284Z return func(*args, **kwargs) 2025-12-04T10:13:48.5359691Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5360160Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5361046Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5361498Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5362397Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5362749Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5363592Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5364033Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5364875Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5365304Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5366148Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5366566Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5367424Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5367852Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5369401Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.5369722Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5370310Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5371389Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5371706Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5372340Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5372820Z [rank1]:E1204 10:04:38.023000 87911 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5373278Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5373937Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5374935Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5375476Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5376457Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5376852Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5377808Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5378294Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5379474Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5379956Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5381183Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5382012Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5383380Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5383975Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5385696Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.5386102Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5386756Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5387958Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5388315Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5389036Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5389583Z [rank0]:E1204 10:04:38.023000 87910 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5390038Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5390667Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5391756Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5392331Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5393198Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5393559Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5394400Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5394836Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5395940Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5396370Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5397225Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5397660Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5398513Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5398942Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5400455Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.5400802Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5401387Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5402442Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5402756Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5403400Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5403876Z [rank3]:E1204 10:04:38.025000 87913 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5404278Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5404769Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5405650Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5406102Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5406972Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5407323Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5408170Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5408627Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5409469Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5409892Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5410767Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5411157Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5412008Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5412434Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5414292Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 491716608 and is now 649003008. 2025-12-04T10:13:48.5414658Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5415307Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5416498Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5416856Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5417571Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5418146Z [rank2]:E1204 10:04:38.031000 87912 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5418251Z dist init r=1, world=4 2025-12-04T10:13:48.5418345Z dist init r=3, world=4 2025-12-04T10:13:48.5418435Z dist init r=2, world=4 2025-12-04T10:13:48.5418532Z dist init r=0, world=4 2025-12-04T10:13:48.5419684Z [rank0]:[W1204 10:04:38.540387881 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5419785Z FAILED [10.3937s] [ 8%] 2025-12-04T10:13:48.5419802Z 2025-12-04T10:13:48.5419941Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5420291Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.5420412Z Traceback (most recent call last): 2025-12-04T10:13:48.5420955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5421064Z self._join_processes(fn) 2025-12-04T10:13:48.5421649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5421814Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5422420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5422531Z raise RuntimeError(error) 2025-12-04T10:13:48.5422758Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5422880Z Traceback (most recent call last): 2025-12-04T10:13:48.5423449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5423556Z getattr(self, test_name)() 2025-12-04T10:13:48.5424093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5424179Z fn() 2025-12-04T10:13:48.5424685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5424785Z method(*args, **kwargs) 2025-12-04T10:13:48.5425285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5425423Z method(*args, **kwargs) 2025-12-04T10:13:48.5426125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5426207Z with policy(): 2025-12-04T10:13:48.5426661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5426752Z raise RuntimeError(msg) 2025-12-04T10:13:48.5427871Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.5427877Z 2025-12-04T10:13:48.5428067Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5428722Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5428728Z 2025-12-04T10:13:48.5428957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5428964Z 2025-12-04T10:13:48.5428969Z 2025-12-04T10:13:48.5429159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5429397Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5430125Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a459d6ece2e0d396.xml - 2025-12-04T10:13:48.5430275Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5431076Z FAILED [10.3937s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5431181Z Traceback (most recent call last): 2025-12-04T10:13:48.5431671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5431768Z getattr(self, test_name)() 2025-12-04T10:13:48.5432245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5432322Z fn() 2025-12-04T10:13:48.5432767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5432864Z method(*args, **kwargs) 2025-12-04T10:13:48.5433331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5433418Z method(*args, **kwargs) 2025-12-04T10:13:48.5433866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5433952Z with policy(): 2025-12-04T10:13:48.5434403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5434522Z raise RuntimeError(msg) 2025-12-04T10:13:48.5435638Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.5435644Z 2025-12-04T10:13:48.5435835Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5436493Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5436525Z 2025-12-04T10:13:48.5436761Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5436916Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5437071Z ====================== 1 failed, 21 deselected in 10.61s ======================= 2025-12-04T10:13:48.5437160Z Got exit code 1 2025-12-04T10:13:48.5437252Z Retrying single test... 2025-12-04T10:13:48.5437811Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82bd08cbda0b3168.xml 2025-12-04T10:13:48.5437948Z ============================= test session starts ============================== 2025-12-04T10:13:48.5438252Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5438350Z cachedir: .pytest_cache 2025-12-04T10:13:48.5438803Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5438907Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5439001Z configfile: pytest.ini 2025-12-04T10:13:48.5439473Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5439670Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.5440415Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5440510Z Running 1 items in this shard 2025-12-04T10:13:48.5440515Z 2025-12-04T10:13:48.5441490Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda I1204 10:04:44.819000 88195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 88247 2025-12-04T10:13:48.5441928Z I1204 10:04:44.820000 88195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 88248 2025-12-04T10:13:48.5442363Z I1204 10:04:44.821000 88195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 88249 2025-12-04T10:13:48.5442792Z I1204 10:04:44.822000 88195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 88250 2025-12-04T10:13:48.5444623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5444707Z _warn_cpu_init() 2025-12-04T10:13:48.5446484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5446599Z _warn_cpu_init() 2025-12-04T10:13:48.5448370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5448490Z _warn_cpu_init() 2025-12-04T10:13:48.5450241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5450331Z _warn_cpu_init() 2025-12-04T10:13:48.5451207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5451307Z return func(*args, **kwargs) 2025-12-04T10:13:48.5451711Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5452179Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5453064Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5453800Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5454796Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5455188Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5456150Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5456632Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5457581Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5458068Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5459049Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5459495Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5460449Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5460968Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5462680Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T10:13:48.5463068Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5463724Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5464912Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5465276Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5466180Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5466668Z [rank0]:E1204 10:04:53.203000 88247 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5467060Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5467528Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5468435Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5468879Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5469765Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5470115Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5470961Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5471391Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5472257Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5472688Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5473534Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5473955Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5474803Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5475237Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5476755Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.5477099Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5477690Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5478890Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5479411Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5480126Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5480674Z [rank2]:E1204 10:04:53.205000 88249 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5481122Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5481706Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5482703Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5483205Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5484192Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5484587Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5485542Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5486058Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5487010Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5487496Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5488481Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5488928Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5489881Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5490368Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5492124Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T10:13:48.5492445Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5493026Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5494364Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5494735Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5495442Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5496039Z [rank1]:E1204 10:04:53.205000 88248 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5496484Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5497007Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5498004Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5498509Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5499497Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5499887Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5500879Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5501362Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5502310Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5502822Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5503769Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5504215Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5505171Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5505792Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5507422Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.5507739Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5508319Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5509371Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5509694Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5510346Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5510833Z [rank3]:E1204 10:04:53.206000 88250 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5510918Z dist init r=3, world=4 2025-12-04T10:13:48.5510999Z dist init r=2, world=4 2025-12-04T10:13:48.5511084Z dist init r=1, world=4 2025-12-04T10:13:48.5511165Z dist init r=0, world=4 2025-12-04T10:13:48.5512183Z [rank0]:[W1204 10:04:53.719649244 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5512274Z FAILED [10.2866s] [100%] 2025-12-04T10:13:48.5512281Z 2025-12-04T10:13:48.5512406Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5512721Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.5512822Z Traceback (most recent call last): 2025-12-04T10:13:48.5513328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5513428Z self._join_processes(fn) 2025-12-04T10:13:48.5513941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5514069Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5514596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5514719Z raise RuntimeError(error) 2025-12-04T10:13:48.5514926Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5515027Z Traceback (most recent call last): 2025-12-04T10:13:48.5515498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5515596Z getattr(self, test_name)() 2025-12-04T10:13:48.5516063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5516144Z fn() 2025-12-04T10:13:48.5516585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5516699Z method(*args, **kwargs) 2025-12-04T10:13:48.5517141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5517230Z method(*args, **kwargs) 2025-12-04T10:13:48.5517672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5517760Z with policy(): 2025-12-04T10:13:48.5518211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5518308Z raise RuntimeError(msg) 2025-12-04T10:13:48.5519425Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T10:13:48.5519433Z 2025-12-04T10:13:48.5519621Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5520276Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5520283Z 2025-12-04T10:13:48.5520512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5520517Z 2025-12-04T10:13:48.5520547Z 2025-12-04T10:13:48.5520747Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5520977Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5521686Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82bd08cbda0b3168.xml - 2025-12-04T10:13:48.5521835Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5522625Z FAILED [10.2866s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5522736Z Traceback (most recent call last): 2025-12-04T10:13:48.5523218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5523319Z getattr(self, test_name)() 2025-12-04T10:13:48.5523788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5523864Z fn() 2025-12-04T10:13:48.5524336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5524425Z method(*args, **kwargs) 2025-12-04T10:13:48.5524866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5524961Z method(*args, **kwargs) 2025-12-04T10:13:48.5525398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5525513Z with policy(): 2025-12-04T10:13:48.5525959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5526053Z raise RuntimeError(msg) 2025-12-04T10:13:48.5527170Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T10:13:48.5527176Z 2025-12-04T10:13:48.5527454Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5528114Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5528119Z 2025-12-04T10:13:48.5528348Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5528500Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5528663Z ====================== 1 failed, 32 deselected in 10.50s ======================= 2025-12-04T10:13:48.5528744Z Got exit code 1 2025-12-04T10:13:48.5528840Z Retrying single test... 2025-12-04T10:13:48.5529389Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cb7d608d20fa1845.xml 2025-12-04T10:13:48.5529529Z ============================= test session starts ============================== 2025-12-04T10:13:48.5529839Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5529932Z cachedir: .pytest_cache 2025-12-04T10:13:48.5530385Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5530499Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5530589Z configfile: pytest.ini 2025-12-04T10:13:48.5531089Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5531275Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.5531993Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5532095Z Running 1 items in this shard 2025-12-04T10:13:48.5532100Z 2025-12-04T10:13:48.5533068Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda I1204 10:04:59.889000 88532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 88584 2025-12-04T10:13:48.5533582Z I1204 10:04:59.890000 88532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 88585 2025-12-04T10:13:48.5534241Z I1204 10:04:59.891000 88532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 88586 2025-12-04T10:13:48.5534726Z I1204 10:04:59.892000 88532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 88587 2025-12-04T10:13:48.5536795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5536891Z _warn_cpu_init() 2025-12-04T10:13:48.5538900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5539023Z _warn_cpu_init() 2025-12-04T10:13:48.5541025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5541159Z _warn_cpu_init() 2025-12-04T10:13:48.5543167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5543260Z _warn_cpu_init() 2025-12-04T10:13:48.5544250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5544357Z return func(*args, **kwargs) 2025-12-04T10:13:48.5544809Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5545344Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5546424Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5546875Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5547736Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5548087Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5548936Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5549364Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5550234Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5550658Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5551508Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5551926Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5552773Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5553213Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5554724Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912. 2025-12-04T10:13:48.5555076Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5555653Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5556709Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5557029Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5557668Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5558143Z [rank0]:E1204 10:05:08.177000 88584 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5558539Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5559032Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5559916Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5560365Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5561237Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5561585Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5562437Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5562899Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5563747Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5564194Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5565726Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5566438Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5567984Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5568820Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5571595Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.5572223Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5573502Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5575912Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5576571Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5577901Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5579266Z [rank2]:E1204 10:05:08.177000 88586 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5580133Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5581142Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5583085Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5584092Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5585953Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5586693Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5588589Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5589484Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5591313Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5592231Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5593868Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5594581Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5596206Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5597175Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5600023Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.5600553Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5601145Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5602213Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5602534Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5603247Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5603726Z [rank1]:E1204 10:05:08.178000 88585 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5604126Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5604791Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5605731Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5606217Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5607139Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5607543Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5608449Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5608900Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5609860Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5610312Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5611214Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5611663Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5612559Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5613023Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5614968Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.5615338Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5615992Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5617229Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5617587Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5618306Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5618848Z [rank3]:E1204 10:05:08.179000 88587 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5618949Z dist init r=3, world=4 2025-12-04T10:13:48.5619050Z dist init r=1, world=4 2025-12-04T10:13:48.5619143Z dist init r=2, world=4 2025-12-04T10:13:48.5619235Z dist init r=0, world=4 2025-12-04T10:13:48.5620396Z [rank0]:[W1204 10:05:08.693268095 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5620493Z FAILED [10.2545s] [100%] 2025-12-04T10:13:48.5620502Z 2025-12-04T10:13:48.5620649Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5621027Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.5621144Z Traceback (most recent call last): 2025-12-04T10:13:48.5621695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5621803Z self._join_processes(fn) 2025-12-04T10:13:48.5622382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5622559Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5623157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5623273Z raise RuntimeError(error) 2025-12-04T10:13:48.5623500Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5623615Z Traceback (most recent call last): 2025-12-04T10:13:48.5624155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5624260Z getattr(self, test_name)() 2025-12-04T10:13:48.5624826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5624910Z fn() 2025-12-04T10:13:48.5625407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5625624Z method(*args, **kwargs) 2025-12-04T10:13:48.5626197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5626286Z method(*args, **kwargs) 2025-12-04T10:13:48.5626733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5626817Z with policy(): 2025-12-04T10:13:48.5627266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5627361Z raise RuntimeError(msg) 2025-12-04T10:13:48.5628480Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.5628489Z 2025-12-04T10:13:48.5628686Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5629368Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5629374Z 2025-12-04T10:13:48.5629611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5629616Z 2025-12-04T10:13:48.5629620Z 2025-12-04T10:13:48.5629813Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5630048Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5630757Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cb7d608d20fa1845.xml - 2025-12-04T10:13:48.5630902Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5631711Z FAILED [10.2545s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5631813Z Traceback (most recent call last): 2025-12-04T10:13:48.5632300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5632397Z getattr(self, test_name)() 2025-12-04T10:13:48.5632896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5632986Z fn() 2025-12-04T10:13:48.5633430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5633519Z method(*args, **kwargs) 2025-12-04T10:13:48.5633966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5634081Z method(*args, **kwargs) 2025-12-04T10:13:48.5634529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5634609Z with policy(): 2025-12-04T10:13:48.5635052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5635150Z raise RuntimeError(msg) 2025-12-04T10:13:48.5636262Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.5636294Z 2025-12-04T10:13:48.5636488Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5637140Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5637145Z 2025-12-04T10:13:48.5637380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5637542Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5637698Z ====================== 1 failed, 32 deselected in 10.47s ======================= 2025-12-04T10:13:48.5637787Z Got exit code 1 2025-12-04T10:13:48.5638366Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5638725Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.5639269Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3bf4168a6952dca5.xml 2025-12-04T10:13:48.5639412Z ============================= test session starts ============================== 2025-12-04T10:13:48.5639741Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5639838Z cachedir: .pytest_cache 2025-12-04T10:13:48.5640292Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5640404Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5640496Z configfile: pytest.ini 2025-12-04T10:13:48.5640961Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5641157Z collecting ... collected 60 items / 22 deselected / 38 selected 2025-12-04T10:13:48.5641278Z stepcurrent: skipping 22 already run items. 2025-12-04T10:13:48.5641371Z Running 11 items in this shard 2025-12-04T10:13:48.5641384Z 2025-12-04T10:13:48.5642354Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 10:05:14.849000 88869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 88921 2025-12-04T10:13:48.5642792Z I1204 10:05:14.850000 88869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 88922 2025-12-04T10:13:48.5643256Z I1204 10:05:14.851000 88869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 88923 2025-12-04T10:13:48.5643689Z I1204 10:05:14.852000 88869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 88924 2025-12-04T10:13:48.5645481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5645593Z _warn_cpu_init() 2025-12-04T10:13:48.5647380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5647492Z _warn_cpu_init() 2025-12-04T10:13:48.5649273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5649357Z _warn_cpu_init() 2025-12-04T10:13:48.5651128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5651222Z _warn_cpu_init() 2025-12-04T10:13:48.5652101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5652205Z return func(*args, **kwargs) 2025-12-04T10:13:48.5652633Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5653102Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5654293Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5654796Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5655779Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5656172Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5657180Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5657663Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5658620Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5659140Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5660094Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5660541Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5661497Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5662014Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5663714Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.5664084Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5664741Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5666127Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5666451Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5667110Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5667597Z [rank1]:E1204 10:05:23.768000 88922 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5667992Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5668454Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5669345Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5669791Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5670671Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5671017Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5671895Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5672322Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5673163Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5673622Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5674468Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5674861Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5675733Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5676163Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5677669Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5677990Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5678572Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5680053Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5680501Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5681215Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5681762Z [rank2]:E1204 10:05:23.768000 88923 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5682213Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5682737Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5683735Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5684238Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5685263Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5685658Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5686617Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5687137Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5688091Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5688579Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5689528Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5690012Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5691128Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5692121Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5694470Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5694858Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5695511Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5696770Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5697135Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5697848Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5698398Z [rank3]:E1204 10:05:23.770000 88924 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5698849Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5699374Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5700388Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5700924Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5701922Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5702313Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5703310Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5703790Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5704753Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5705373Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5706365Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5706937Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5707868Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5708479Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5710344Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 0. CUDA driver allocated memory was 718209024 and is now 734986240. 2025-12-04T10:13:48.5710723Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5711434Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5712621Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5712972Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5713669Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5714198Z [rank0]:E1204 10:05:23.774000 88921 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5714293Z dist init r=3, world=4 2025-12-04T10:13:48.5714391Z dist init r=1, world=4 2025-12-04T10:13:48.5714484Z dist init r=0, world=4 2025-12-04T10:13:48.5714580Z dist init r=2, world=4 2025-12-04T10:13:48.5715724Z [rank0]:[W1204 10:05:24.290918886 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5715821Z FAILED [11.4283s] [ 9%] 2025-12-04T10:13:48.5715831Z 2025-12-04T10:13:48.5715982Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5716318Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.5716439Z Traceback (most recent call last): 2025-12-04T10:13:48.5716997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5717102Z self._join_processes(fn) 2025-12-04T10:13:48.5717677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5717813Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5718395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5718510Z raise RuntimeError(error) 2025-12-04T10:13:48.5718764Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5718885Z Traceback (most recent call last): 2025-12-04T10:13:48.5719403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5719510Z getattr(self, test_name)() 2025-12-04T10:13:48.5720029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5720111Z fn() 2025-12-04T10:13:48.5720599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5720704Z method(*args, **kwargs) 2025-12-04T10:13:48.5721192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5721297Z method(*args, **kwargs) 2025-12-04T10:13:48.5721780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5721869Z with policy(): 2025-12-04T10:13:48.5722358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5722460Z raise RuntimeError(msg) 2025-12-04T10:13:48.5723712Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.5723720Z 2025-12-04T10:13:48.5723926Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5724635Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5724643Z 2025-12-04T10:13:48.5724903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5724908Z 2025-12-04T10:13:48.5725062Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5725180Z Traceback (most recent call last): 2025-12-04T10:13:48.5725711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5725815Z getattr(self, test_name)() 2025-12-04T10:13:48.5726333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5726418Z fn() 2025-12-04T10:13:48.5726904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5727030Z method(*args, **kwargs) 2025-12-04T10:13:48.5727514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5727622Z method(*args, **kwargs) 2025-12-04T10:13:48.5728105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5728223Z with policy(): 2025-12-04T10:13:48.5728723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5728825Z raise RuntimeError(msg) 2025-12-04T10:13:48.5730044Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5730053Z 2025-12-04T10:13:48.5730258Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5730999Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5731012Z 2025-12-04T10:13:48.5731264Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5731272Z 2025-12-04T10:13:48.5731277Z 2025-12-04T10:13:48.5731490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5731747Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5732523Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3bf4168a6952dca5.xml - 2025-12-04T10:13:48.5732694Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5733803Z FAILED [11.4283s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.5733932Z Traceback (most recent call last): 2025-12-04T10:13:48.5734483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5734596Z getattr(self, test_name)() 2025-12-04T10:13:48.5735126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5735222Z fn() 2025-12-04T10:13:48.5735764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5735874Z method(*args, **kwargs) 2025-12-04T10:13:48.5736379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5736482Z method(*args, **kwargs) 2025-12-04T10:13:48.5736986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5737079Z with policy(): 2025-12-04T10:13:48.5737592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5737706Z raise RuntimeError(msg) 2025-12-04T10:13:48.5738966Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.5738972Z 2025-12-04T10:13:48.5739193Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5739956Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5739964Z 2025-12-04T10:13:48.5740232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5740238Z 2025-12-04T10:13:48.5740393Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.5740512Z Traceback (most recent call last): 2025-12-04T10:13:48.5741092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5741199Z getattr(self, test_name)() 2025-12-04T10:13:48.5741742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5741827Z fn() 2025-12-04T10:13:48.5742327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5742439Z method(*args, **kwargs) 2025-12-04T10:13:48.5742935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5743067Z method(*args, **kwargs) 2025-12-04T10:13:48.5743573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5743669Z with policy(): 2025-12-04T10:13:48.5744183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5744287Z raise RuntimeError(msg) 2025-12-04T10:13:48.5745531Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5745549Z 2025-12-04T10:13:48.5745897Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5746712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5746718Z 2025-12-04T10:13:48.5746969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5747134Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5747296Z ====================== 1 failed, 22 deselected in 11.65s ======================= 2025-12-04T10:13:48.5747391Z Got exit code 1 2025-12-04T10:13:48.5747523Z Retrying single test... 2025-12-04T10:13:48.5748115Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-69250cff44e166fa.xml 2025-12-04T10:13:48.5748265Z ============================= test session starts ============================== 2025-12-04T10:13:48.5748749Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5748858Z cachedir: .pytest_cache 2025-12-04T10:13:48.5749352Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5749469Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5749576Z configfile: pytest.ini 2025-12-04T10:13:48.5750089Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5750301Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.5751088Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5751193Z Running 1 items in this shard 2025-12-04T10:13:48.5751229Z 2025-12-04T10:13:48.5752288Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 10:05:30.819000 89206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 89258 2025-12-04T10:13:48.5752770Z I1204 10:05:30.820000 89206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 89259 2025-12-04T10:13:48.5753342Z I1204 10:05:30.821000 89206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 89260 2025-12-04T10:13:48.5753812Z I1204 10:05:30.822000 89206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 89261 2025-12-04T10:13:48.5755795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5755938Z _warn_cpu_init() 2025-12-04T10:13:48.5757880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5757977Z _warn_cpu_init() 2025-12-04T10:13:48.5759911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5760017Z _warn_cpu_init() 2025-12-04T10:13:48.5761975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5762189Z _warn_cpu_init() 2025-12-04T10:13:48.5763205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5763314Z return func(*args, **kwargs) 2025-12-04T10:13:48.5763720Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5764197Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5765086Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5765533Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5766436Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5766786Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5767640Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5768098Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5768943Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5769380Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5770254Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5770652Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5771503Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5771944Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5773519Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.5774040Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5774705Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5775933Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5776304Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5777013Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5777563Z [rank0]:E1204 10:05:39.559000 89258 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5778012Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5778540Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5779755Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5780332Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5781322Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5781760Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5782727Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5783208Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5784159Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5784684Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5785638Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5786096Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5787047Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5787539Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5789250Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.5789609Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5790310Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5791553Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5791899Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5792567Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5793084Z [rank1]:E1204 10:05:39.560000 89259 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5793504Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5793996Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5794975Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5795448Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5796370Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5796769Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5797678Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5798131Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5799050Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5799508Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5800401Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5800828Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5801789Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5802227Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5803768Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.5804087Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5804677Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5805741Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5806064Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5806691Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5807172Z [rank2]:E1204 10:05:39.561000 89260 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5807603Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5808067Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5808954Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5809433Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5810313Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5810659Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5811511Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5811966Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5812805Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5813295Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5814378Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5814822Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5815779Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5816271Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5818007Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5818365Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5819026Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5820199Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5820567Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5821272Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5821846Z [rank3]:E1204 10:05:39.562000 89261 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5821947Z dist init r=1, world=4 2025-12-04T10:13:48.5822042Z dist init r=2, world=4 2025-12-04T10:13:48.5822143Z dist init r=3, world=4 2025-12-04T10:13:48.5822236Z dist init r=0, world=4 2025-12-04T10:13:48.5823378Z [rank0]:[W1204 10:05:39.074098993 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5823511Z FAILED [10.5916s] [100%] 2025-12-04T10:13:48.5823520Z 2025-12-04T10:13:48.5823663Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5824018Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.5824134Z Traceback (most recent call last): 2025-12-04T10:13:48.5824676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5824824Z self._join_processes(fn) 2025-12-04T10:13:48.5825406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5825553Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5826252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5826360Z raise RuntimeError(error) 2025-12-04T10:13:48.5826697Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5826807Z Traceback (most recent call last): 2025-12-04T10:13:48.5827307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5827422Z getattr(self, test_name)() 2025-12-04T10:13:48.5827917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5828005Z fn() 2025-12-04T10:13:48.5828475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5828567Z method(*args, **kwargs) 2025-12-04T10:13:48.5829044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5829138Z method(*args, **kwargs) 2025-12-04T10:13:48.5829635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5829732Z with policy(): 2025-12-04T10:13:48.5830203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5830314Z raise RuntimeError(msg) 2025-12-04T10:13:48.5831503Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.5831511Z 2025-12-04T10:13:48.5831719Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5832408Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5832415Z 2025-12-04T10:13:48.5832666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5832670Z 2025-12-04T10:13:48.5832675Z 2025-12-04T10:13:48.5832884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5833158Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5833914Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-69250cff44e166fa.xml - 2025-12-04T10:13:48.5834072Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5834906Z FAILED [10.5916s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.5835050Z Traceback (most recent call last): 2025-12-04T10:13:48.5835565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5835670Z getattr(self, test_name)() 2025-12-04T10:13:48.5836173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5836253Z fn() 2025-12-04T10:13:48.5836737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5836860Z method(*args, **kwargs) 2025-12-04T10:13:48.5837330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5837432Z method(*args, **kwargs) 2025-12-04T10:13:48.5837902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5837999Z with policy(): 2025-12-04T10:13:48.5838476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5838574Z raise RuntimeError(msg) 2025-12-04T10:13:48.5839763Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.5839770Z 2025-12-04T10:13:48.5839973Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5840670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5840678Z 2025-12-04T10:13:48.5840925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5841125Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5841290Z ====================== 1 failed, 32 deselected in 10.81s ======================= 2025-12-04T10:13:48.5841377Z Got exit code 1 2025-12-04T10:13:48.5841482Z Retrying single test... 2025-12-04T10:13:48.5842064Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-311a7d97c78eb59e.xml 2025-12-04T10:13:48.5842211Z ============================= test session starts ============================== 2025-12-04T10:13:48.5842537Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5842634Z cachedir: .pytest_cache 2025-12-04T10:13:48.5843131Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5843242Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5843339Z configfile: pytest.ini 2025-12-04T10:13:48.5843843Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5844042Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.5844833Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5844948Z Running 1 items in this shard 2025-12-04T10:13:48.5844952Z 2025-12-04T10:13:48.5845976Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 10:05:46.219000 89543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 89595 2025-12-04T10:13:48.5846478Z I1204 10:05:46.220000 89543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 89596 2025-12-04T10:13:48.5846936Z I1204 10:05:46.221000 89543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 89597 2025-12-04T10:13:48.5847400Z I1204 10:05:46.222000 89543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 89598 2025-12-04T10:13:48.5849298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5849420Z _warn_cpu_init() 2025-12-04T10:13:48.5851298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5851390Z _warn_cpu_init() 2025-12-04T10:13:48.5853350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5853447Z _warn_cpu_init() 2025-12-04T10:13:48.5855635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5855729Z _warn_cpu_init() 2025-12-04T10:13:48.5856731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5856838Z return func(*args, **kwargs) 2025-12-04T10:13:48.5857301Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5863383Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5864467Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5865081Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5866272Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5866688Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5867590Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5868042Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5868948Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5869428Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5870330Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5870748Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5872228Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5873072Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5876018Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.5876627Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5877745Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5880205Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5880864Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5882164Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5883204Z [rank1]:E1204 10:05:54.904000 89596 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.5884025Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5884957Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5886922Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5887910Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5889809Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5890682Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5892626Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5893570Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5895627Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5896498Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5898262Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5899103Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5900953Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5901888Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5905409Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.5906187Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5907326Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5909390Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5909904Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5910778Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5911304Z [rank0]:E1204 10:05:54.905000 89595 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.5911825Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5912343Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5913312Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5913842Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5914799Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5915192Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5916116Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5916621Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5917555Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5918024Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5918951Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5919380Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5920320Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5920793Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5922496Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.5922846Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5923480Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5924637Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5924990Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5925684Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5926332Z [rank3]:E1204 10:05:54.905000 89598 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5926769Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5927263Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5928228Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5928709Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5929633Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5930040Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5930941Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5931407Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5932305Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5932761Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5933947Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5934393Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5935361Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5935897Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5937612Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5937974Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5938627Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5939819Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5940207Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5940922Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5941463Z [rank2]:E1204 10:05:54.906000 89597 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.5941571Z dist init r=2, world=4 2025-12-04T10:13:48.5941699Z dist init r=0, world=4 2025-12-04T10:13:48.5941791Z dist init r=3, world=4 2025-12-04T10:13:48.5941893Z dist init r=1, world=4 2025-12-04T10:13:48.5943051Z [rank0]:[W1204 10:05:55.416608478 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.5943149Z FAILED [10.5894s] [100%] 2025-12-04T10:13:48.5943157Z 2025-12-04T10:13:48.5943308Z =================================== FAILURES =================================== 2025-12-04T10:13:48.5943658Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.5943815Z Traceback (most recent call last): 2025-12-04T10:13:48.5944362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.5944472Z self._join_processes(fn) 2025-12-04T10:13:48.5945063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.5945204Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.5945922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.5946029Z raise RuntimeError(error) 2025-12-04T10:13:48.5946248Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.5946367Z Traceback (most recent call last): 2025-12-04T10:13:48.5946869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5946974Z getattr(self, test_name)() 2025-12-04T10:13:48.5947477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5947561Z fn() 2025-12-04T10:13:48.5948041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5948135Z method(*args, **kwargs) 2025-12-04T10:13:48.5948637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5948743Z method(*args, **kwargs) 2025-12-04T10:13:48.5949213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5949301Z with policy(): 2025-12-04T10:13:48.5949786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5949887Z raise RuntimeError(msg) 2025-12-04T10:13:48.5951073Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5951083Z 2025-12-04T10:13:48.5951284Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5951973Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5951994Z 2025-12-04T10:13:48.5952270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5952276Z 2025-12-04T10:13:48.5952280Z 2025-12-04T10:13:48.5952489Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.5952742Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.5953660Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-311a7d97c78eb59e.xml - 2025-12-04T10:13:48.5953865Z =========================== short test summary info ============================ 2025-12-04T10:13:48.5954746Z FAILED [10.5894s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.5954862Z Traceback (most recent call last): 2025-12-04T10:13:48.5955402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5955511Z getattr(self, test_name)() 2025-12-04T10:13:48.5956079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5956162Z fn() 2025-12-04T10:13:48.5956653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5956760Z method(*args, **kwargs) 2025-12-04T10:13:48.5957242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5957346Z method(*args, **kwargs) 2025-12-04T10:13:48.5957839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5957929Z with policy(): 2025-12-04T10:13:48.5958450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5958551Z raise RuntimeError(msg) 2025-12-04T10:13:48.5959766Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.5959774Z 2025-12-04T10:13:48.5959985Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5960722Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5960728Z 2025-12-04T10:13:48.5960986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5961157Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.5961329Z ====================== 1 failed, 32 deselected in 10.81s ======================= 2025-12-04T10:13:48.5961426Z Got exit code 1 2025-12-04T10:13:48.5962057Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.5962456Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.5963057Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5f502e67619c39f3.xml 2025-12-04T10:13:48.5963212Z ============================= test session starts ============================== 2025-12-04T10:13:48.5963554Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.5963655Z cachedir: .pytest_cache 2025-12-04T10:13:48.5964186Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.5964306Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.5964406Z configfile: pytest.ini 2025-12-04T10:13:48.5964928Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.5965134Z collecting ... collected 60 items / 23 deselected / 37 selected 2025-12-04T10:13:48.5965299Z stepcurrent: skipping 23 already run items. 2025-12-04T10:13:48.5965415Z Running 10 items in this shard 2025-12-04T10:13:48.5965421Z 2025-12-04T10:13:48.5966477Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 10:06:01.699000 89880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 89932 2025-12-04T10:13:48.5966961Z I1204 10:06:01.700000 89880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 89933 2025-12-04T10:13:48.5967543Z I1204 10:06:01.701000 89880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 89934 2025-12-04T10:13:48.5968033Z I1204 10:06:01.702000 89880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 89935 2025-12-04T10:13:48.5969931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5970023Z _warn_cpu_init() 2025-12-04T10:13:48.5971909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5972002Z _warn_cpu_init() 2025-12-04T10:13:48.5974176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5974277Z _warn_cpu_init() 2025-12-04T10:13:48.5976274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.5976372Z _warn_cpu_init() 2025-12-04T10:13:48.5977372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.5977480Z return func(*args, **kwargs) 2025-12-04T10:13:48.5977942Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5978512Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5979691Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5980210Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5981301Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5981700Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5982658Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5983199Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5984161Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5984642Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5985609Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.5986050Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.5987015Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.5987502Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.5989243Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 586088448 and is now 649003008. 2025-12-04T10:13:48.5989612Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5990271Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.5991508Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.5991848Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.5992526Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.5993138Z [rank3]:E1204 10:06:09.775000 89935 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.5993560Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.5994067Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.5995023Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.5995539Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.5996463Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.5996841Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.5997766Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5998223Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.5999125Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.5999582Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6000484Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6000922Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6002481Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6003467Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6005128Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.6005488Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6006124Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6007280Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6007628Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6008363Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6008897Z [rank2]:E1204 10:06:09.776000 89934 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6009332Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6009846Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6010846Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6011343Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6012296Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6012717Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6013909Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6014398Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6015359Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6015841Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6016800Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6017246Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6018242Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6018732Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6020437Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.6020807Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6021459Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6022641Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6023031Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6023751Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6024294Z [rank1]:E1204 10:06:09.776000 89933 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6024767Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6025307Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6026361Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6026841Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6027792Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6028168Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6029072Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6029526Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6030429Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6030880Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6031784Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6032229Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6033143Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6033596Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6035195Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 714014720 and is now 758054912. 2025-12-04T10:13:48.6035544Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6036156Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6037296Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6037641Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6038319Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6038871Z [rank0]:E1204 10:06:09.781000 89932 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6038969Z dist init r=3, world=4 2025-12-04T10:13:48.6039064Z dist init r=2, world=4 2025-12-04T10:13:48.6039150Z dist init r=1, world=4 2025-12-04T10:13:48.6039238Z dist init r=0, world=4 2025-12-04T10:13:48.6040334Z [rank0]:[W1204 10:06:10.294331115 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6040459Z FAILED [10.4153s] [ 10%] 2025-12-04T10:13:48.6040466Z 2025-12-04T10:13:48.6040610Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6040930Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.6041041Z Traceback (most recent call last): 2025-12-04T10:13:48.6041560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6041661Z self._join_processes(fn) 2025-12-04T10:13:48.6042218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6042349Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6042911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6043023Z raise RuntimeError(error) 2025-12-04T10:13:48.6043236Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6043352Z Traceback (most recent call last): 2025-12-04T10:13:48.6043853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6043958Z getattr(self, test_name)() 2025-12-04T10:13:48.6044490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6044571Z fn() 2025-12-04T10:13:48.6045042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6045142Z method(*args, **kwargs) 2025-12-04T10:13:48.6045613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6045715Z method(*args, **kwargs) 2025-12-04T10:13:48.6046184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6046272Z with policy(): 2025-12-04T10:13:48.6046752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6046854Z raise RuntimeError(msg) 2025-12-04T10:13:48.6048028Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.6048042Z 2025-12-04T10:13:48.6048270Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6048950Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6048958Z 2025-12-04T10:13:48.6049212Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6049217Z 2025-12-04T10:13:48.6049366Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.6049510Z Traceback (most recent call last): 2025-12-04T10:13:48.6050026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6050128Z getattr(self, test_name)() 2025-12-04T10:13:48.6050633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6050715Z fn() 2025-12-04T10:13:48.6051197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6051293Z method(*args, **kwargs) 2025-12-04T10:13:48.6051794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6051889Z method(*args, **kwargs) 2025-12-04T10:13:48.6052359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6052458Z with policy(): 2025-12-04T10:13:48.6052935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6053044Z raise RuntimeError(msg) 2025-12-04T10:13:48.6054499Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 586088448 and is now 649003008. 2025-12-04T10:13:48.6054506Z 2025-12-04T10:13:48.6054723Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6055477Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6055484Z 2025-12-04T10:13:48.6055747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6055753Z 2025-12-04T10:13:48.6055757Z 2025-12-04T10:13:48.6055989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6056284Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6057094Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5f502e67619c39f3.xml - 2025-12-04T10:13:48.6057263Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6058149Z FAILED [10.4153s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6058279Z Traceback (most recent call last): 2025-12-04T10:13:48.6058825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6058946Z getattr(self, test_name)() 2025-12-04T10:13:48.6059477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6059565Z fn() 2025-12-04T10:13:48.6060077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6060179Z method(*args, **kwargs) 2025-12-04T10:13:48.6060707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6060817Z method(*args, **kwargs) 2025-12-04T10:13:48.6061316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6061423Z with policy(): 2025-12-04T10:13:48.6061924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6062063Z raise RuntimeError(msg) 2025-12-04T10:13:48.6063304Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.6063310Z 2025-12-04T10:13:48.6063528Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6064249Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6064285Z 2025-12-04T10:13:48.6064542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6064548Z 2025-12-04T10:13:48.6064707Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.6064835Z Traceback (most recent call last): 2025-12-04T10:13:48.6065383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6065608Z getattr(self, test_name)() 2025-12-04T10:13:48.6066230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6066310Z fn() 2025-12-04T10:13:48.6066791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6066889Z method(*args, **kwargs) 2025-12-04T10:13:48.6067360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6067465Z method(*args, **kwargs) 2025-12-04T10:13:48.6067931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6068034Z with policy(): 2025-12-04T10:13:48.6068502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6068630Z raise RuntimeError(msg) 2025-12-04T10:13:48.6069806Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 586088448 and is now 649003008. 2025-12-04T10:13:48.6069812Z 2025-12-04T10:13:48.6070011Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6070698Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6070703Z 2025-12-04T10:13:48.6070944Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6071115Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6071293Z ====================== 1 failed, 23 deselected in 10.63s ======================= 2025-12-04T10:13:48.6071385Z Got exit code 1 2025-12-04T10:13:48.6071487Z Retrying single test... 2025-12-04T10:13:48.6072076Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8eee5abee9febb4.xml 2025-12-04T10:13:48.6072248Z ============================= test session starts ============================== 2025-12-04T10:13:48.6072581Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6072686Z cachedir: .pytest_cache 2025-12-04T10:13:48.6073169Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6073316Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6073415Z configfile: pytest.ini 2025-12-04T10:13:48.6073925Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6074130Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.6075062Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6075180Z Running 1 items in this shard 2025-12-04T10:13:48.6075186Z 2025-12-04T10:13:48.6076223Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 10:06:16.650000 90217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 90269 2025-12-04T10:13:48.6076757Z I1204 10:06:16.650000 90217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 90270 2025-12-04T10:13:48.6077230Z I1204 10:06:16.651000 90217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 90271 2025-12-04T10:13:48.6077704Z I1204 10:06:16.652000 90217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 90272 2025-12-04T10:13:48.6080006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6080110Z _warn_cpu_init() 2025-12-04T10:13:48.6082182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6082282Z _warn_cpu_init() 2025-12-04T10:13:48.6084286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6084383Z _warn_cpu_init() 2025-12-04T10:13:48.6086394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6086491Z _warn_cpu_init() 2025-12-04T10:13:48.6087542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6087655Z return func(*args, **kwargs) 2025-12-04T10:13:48.6088118Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6088699Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6089693Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6090208Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6091296Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6091830Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6092734Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6093248Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6094532Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6095018Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6095980Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6096423Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6097425Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6097912Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6099622Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 711917568 and is now 758054912. 2025-12-04T10:13:48.6099995Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6100656Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6101866Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6102226Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6102956Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6103496Z [rank0]:E1204 10:06:24.782000 90269 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6103978Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6104516Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6105509Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6106022Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6107082Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6107467Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6108365Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6108820Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6109726Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6110179Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6111087Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6111526Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6112434Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6112984Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6114485Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T10:13:48.6114817Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6115397Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6116495Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6116817Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6117454Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6117960Z [rank1]:E1204 10:06:24.782000 90270 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6118356Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6118836Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6119717Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6120691Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6121563Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6121922Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6122771Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6123202Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6124059Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6124492Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6125373Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6125767Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6126628Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6127059Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6128567Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.6128890Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6129499Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6130546Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6130892Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6131545Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6132022Z [rank2]:E1204 10:06:24.784000 90271 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6132421Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6132922Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6134063Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6134579Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6135566Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6135971Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6136928Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6137417Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6138432Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6138920Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6139886Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6140329Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6141284Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6141781Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6143503Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.6143879Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6144538Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6145711Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6146181Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6146819Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6147300Z [rank3]:E1204 10:06:24.784000 90272 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6147416Z dist init r=1, world=4 2025-12-04T10:13:48.6147510Z dist init r=0, world=4 2025-12-04T10:13:48.6147594Z dist init r=3, world=4 2025-12-04T10:13:48.6147677Z dist init r=2, world=4 2025-12-04T10:13:48.6148705Z [rank0]:[W1204 10:06:25.291819729 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6148797Z FAILED [10.3183s] [100%] 2025-12-04T10:13:48.6148803Z 2025-12-04T10:13:48.6148939Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6149240Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.6149348Z Traceback (most recent call last): 2025-12-04T10:13:48.6149837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6149938Z self._join_processes(fn) 2025-12-04T10:13:48.6150467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6150593Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6151129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6151265Z raise RuntimeError(error) 2025-12-04T10:13:48.6151475Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6151581Z Traceback (most recent call last): 2025-12-04T10:13:48.6152065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6152160Z getattr(self, test_name)() 2025-12-04T10:13:48.6152639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6152718Z fn() 2025-12-04T10:13:48.6153163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6153269Z method(*args, **kwargs) 2025-12-04T10:13:48.6153711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6153800Z method(*args, **kwargs) 2025-12-04T10:13:48.6154251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6154337Z with policy(): 2025-12-04T10:13:48.6154819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6154920Z raise RuntimeError(msg) 2025-12-04T10:13:48.6156029Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.6156073Z 2025-12-04T10:13:48.6156261Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6156905Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6156910Z 2025-12-04T10:13:48.6157154Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6157159Z 2025-12-04T10:13:48.6157163Z 2025-12-04T10:13:48.6157356Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6157593Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6158334Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8eee5abee9febb4.xml - 2025-12-04T10:13:48.6158484Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6159274Z FAILED [10.3183s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6159381Z Traceback (most recent call last): 2025-12-04T10:13:48.6159872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6159968Z getattr(self, test_name)() 2025-12-04T10:13:48.6160440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6160523Z fn() 2025-12-04T10:13:48.6160974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6161065Z method(*args, **kwargs) 2025-12-04T10:13:48.6161510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6161604Z method(*args, **kwargs) 2025-12-04T10:13:48.6162056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6162190Z with policy(): 2025-12-04T10:13:48.6162639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6162742Z raise RuntimeError(msg) 2025-12-04T10:13:48.6163839Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.6163847Z 2025-12-04T10:13:48.6164042Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6164681Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6164688Z 2025-12-04T10:13:48.6164917Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6165086Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6165240Z ====================== 1 failed, 32 deselected in 10.53s ======================= 2025-12-04T10:13:48.6165333Z Got exit code 1 2025-12-04T10:13:48.6165423Z Retrying single test... 2025-12-04T10:13:48.6166005Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-acff5684d72dd2d3.xml 2025-12-04T10:13:48.6166155Z ============================= test session starts ============================== 2025-12-04T10:13:48.6166462Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6166554Z cachedir: .pytest_cache 2025-12-04T10:13:48.6167039Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6167144Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6167250Z configfile: pytest.ini 2025-12-04T10:13:48.6167725Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6167914Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.6168632Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6168756Z Running 1 items in this shard 2025-12-04T10:13:48.6168761Z 2025-12-04T10:13:48.6169728Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 10:06:31.659000 90554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 90606 2025-12-04T10:13:48.6170165Z I1204 10:06:31.660000 90554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 90607 2025-12-04T10:13:48.6170598Z I1204 10:06:31.661000 90554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 90608 2025-12-04T10:13:48.6171036Z I1204 10:06:31.662000 90554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 90609 2025-12-04T10:13:48.6172832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6172937Z _warn_cpu_init() 2025-12-04T10:13:48.6175136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6175247Z _warn_cpu_init() 2025-12-04T10:13:48.6177222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6177336Z _warn_cpu_init() 2025-12-04T10:13:48.6179590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6179702Z _warn_cpu_init() 2025-12-04T10:13:48.6180695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6180805Z return func(*args, **kwargs) 2025-12-04T10:13:48.6181319Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6181855Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6182860Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6183364Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6184394Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6184797Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6185763Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6186254Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6187205Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6187694Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6188647Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6189129Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6190103Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6190591Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6192310Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T10:13:48.6192658Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6193280Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6194416Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6194762Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6195441Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6195981Z [rank1]:E1204 10:06:39.691000 90607 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6196416Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6196909Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6197855Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6198358Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6199291Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6199673Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6200571Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6201030Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6201927Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6202393Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6203378Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6203798Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6204708Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6205193Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6208040Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T10:13:48.6208660Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6209977Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6211783Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6212426Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6213826Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6214769Z [rank0]:E1204 10:06:39.691000 90606 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6215552Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6216475Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6218437Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6219328Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6221149Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6221892Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6223711Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6224661Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6226637Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6227651Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6229476Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6230283Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6231829Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6232622Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6235595Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 493813760 and is now 649003008. 2025-12-04T10:13:48.6236219Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6237319Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6239301Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6239999Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6241142Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6242077Z [rank2]:E1204 10:06:39.691000 90608 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6242807Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6243321Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6244219Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6244667Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6245545Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6245907Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6246762Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6247388Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6248349Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6248819Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6249712Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6250132Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6251041Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6251501Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6253131Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T10:13:48.6253582Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6254410Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6255648Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6256013Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6256734Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6257314Z [rank3]:E1204 10:06:39.692000 90609 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6257429Z dist init r=2, world=4 2025-12-04T10:13:48.6257530Z dist init r=0, world=4 2025-12-04T10:13:48.6257627Z dist init r=3, world=4 2025-12-04T10:13:48.6257732Z dist init r=1, world=4 2025-12-04T10:13:48.6258889Z [rank0]:[W1204 10:06:40.190860521 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6258996Z FAILED [10.2106s] [100%] 2025-12-04T10:13:48.6259004Z 2025-12-04T10:13:48.6259156Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6259501Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.6259632Z Traceback (most recent call last): 2025-12-04T10:13:48.6260173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6260280Z self._join_processes(fn) 2025-12-04T10:13:48.6260869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6261010Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6261651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6261765Z raise RuntimeError(error) 2025-12-04T10:13:48.6261997Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6262126Z Traceback (most recent call last): 2025-12-04T10:13:48.6262661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6262786Z getattr(self, test_name)() 2025-12-04T10:13:48.6263315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6263404Z fn() 2025-12-04T10:13:48.6263916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6264025Z method(*args, **kwargs) 2025-12-04T10:13:48.6264539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6264641Z method(*args, **kwargs) 2025-12-04T10:13:48.6265192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6265299Z with policy(): 2025-12-04T10:13:48.6265907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6266121Z raise RuntimeError(msg) 2025-12-04T10:13:48.6267239Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 493813760 and is now 649003008. 2025-12-04T10:13:48.6267273Z 2025-12-04T10:13:48.6267471Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6268116Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6268121Z 2025-12-04T10:13:48.6268359Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6268364Z 2025-12-04T10:13:48.6268369Z 2025-12-04T10:13:48.6268601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6268832Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6269543Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-acff5684d72dd2d3.xml - 2025-12-04T10:13:48.6269702Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6270493Z FAILED [10.2106s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6270612Z Traceback (most recent call last): 2025-12-04T10:13:48.6271097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6271197Z getattr(self, test_name)() 2025-12-04T10:13:48.6271688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6271766Z fn() 2025-12-04T10:13:48.6272221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6272316Z method(*args, **kwargs) 2025-12-04T10:13:48.6272760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6272895Z method(*args, **kwargs) 2025-12-04T10:13:48.6273342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6273428Z with policy(): 2025-12-04T10:13:48.6273887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6273981Z raise RuntimeError(msg) 2025-12-04T10:13:48.6275091Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 493813760 and is now 649003008. 2025-12-04T10:13:48.6275098Z 2025-12-04T10:13:48.6275291Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6275936Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6275947Z 2025-12-04T10:13:48.6276179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6276336Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6276530Z ====================== 1 failed, 32 deselected in 10.43s ======================= 2025-12-04T10:13:48.6276617Z Got exit code 1 2025-12-04T10:13:48.6277187Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.6277555Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.6278132Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-72a03384c8cb338e.xml 2025-12-04T10:13:48.6278286Z ============================= test session starts ============================== 2025-12-04T10:13:48.6278733Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6278848Z cachedir: .pytest_cache 2025-12-04T10:13:48.6279521Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6279644Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6279820Z configfile: pytest.ini 2025-12-04T10:13:48.6280358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6280572Z collecting ... collected 60 items / 24 deselected / 36 selected 2025-12-04T10:13:48.6280719Z stepcurrent: skipping 24 already run items. 2025-12-04T10:13:48.6280831Z Running 9 items in this shard 2025-12-04T10:13:48.6280837Z 2025-12-04T10:13:48.6281898Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 10:06:46.690000 90891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 90943 2025-12-04T10:13:48.6282401Z I1204 10:06:46.691000 90891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 90944 2025-12-04T10:13:48.6282895Z I1204 10:06:46.691000 90891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 90945 2025-12-04T10:13:48.6283391Z I1204 10:06:46.692000 90891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 90946 2025-12-04T10:13:48.6284386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6284530Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6286582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6286683Z _warn_cpu_init() 2025-12-04T10:13:48.6287682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6287810Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6289826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6289923Z _warn_cpu_init() 2025-12-04T10:13:48.6290965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6291298Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6292296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6292532Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6293488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6293595Z return func(*args, **kwargs) 2025-12-04T10:13:48.6294742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6294910Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6295900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6296031Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6298042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6298139Z _warn_cpu_init() 2025-12-04T10:13:48.6300138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6300263Z _warn_cpu_init() 2025-12-04T10:13:48.6301271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6301490Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6302474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6302701Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6303469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6303589Z return func(*args, **kwargs) 2025-12-04T10:13:48.6304350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6304462Z return func(*args, **kwargs) 2025-12-04T10:13:48.6305273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6305383Z return func(*args, **kwargs) 2025-12-04T10:13:48.6306192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6306287Z return func(*args, **kwargs) 2025-12-04T10:13:48.6306984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6307087Z return func(*args, **kwargs) 2025-12-04T10:13:48.6307755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6307851Z return func(*args, **kwargs) 2025-12-04T10:13:48.6308525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6308648Z return func(*args, **kwargs) 2025-12-04T10:13:48.6309329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6309428Z return func(*args, **kwargs) 2025-12-04T10:13:48.6309837Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6310319Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6311205Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6311667Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6312538Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6312899Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6313772Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6314207Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6315071Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6315499Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6316357Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6316752Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6317635Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6318067Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6319545Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 0. CUDA driver allocated memory was 718209024 and is now 760152064. 2025-12-04T10:13:48.6319906Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6320486Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6321514Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6321865Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6322508Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6322992Z [rank0]:E1204 10:06:54.815000 90943 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6323388Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6323865Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6324749Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6325205Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6326084Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6326466Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6327315Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6327747Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6328732Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6329523Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6330989Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6331490Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6332411Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6332873Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6334816Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.6335193Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6335856Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6337050Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6337413Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6338145Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6338688Z [rank1]:E1204 10:06:54.817000 90944 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6339135Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6339670Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6340675Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6341193Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6342209Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6342612Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6343572Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6344060Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6345027Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6345509Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6346535Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6346932Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6347790Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6348249Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6349736Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 267321344 and is now 651100160. 2025-12-04T10:13:48.6350063Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6350669Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6351685Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6352005Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6352647Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6353127Z [rank3]:E1204 10:06:54.817000 90946 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6353526Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6354000Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6354922Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6355380Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6356247Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6356609Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6357454Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6357885Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6358740Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6359198Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6360051Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6360443Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6361335Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6361764Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6363250Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.6363603Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6364181Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6365413Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6365754Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6366612Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6367140Z [rank2]:E1204 10:06:54.817000 90945 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6367325Z dist init r=1, world=4 2025-12-04T10:13:48.6367433Z dist init r=2, world=4 2025-12-04T10:13:48.6367552Z dist init r=0, world=4 2025-12-04T10:13:48.6367655Z dist init r=3, world=4 2025-12-04T10:13:48.6368831Z [rank0]:[W1204 10:06:55.330204016 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6368937Z FAILED [10.2311s] [ 11%] 2025-12-04T10:13:48.6368944Z 2025-12-04T10:13:48.6369101Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6369408Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __ 2025-12-04T10:13:48.6369539Z Traceback (most recent call last): 2025-12-04T10:13:48.6370071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6370177Z self._join_processes(fn) 2025-12-04T10:13:48.6370761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6370899Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6371488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6371612Z raise RuntimeError(error) 2025-12-04T10:13:48.6371838Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6371993Z Traceback (most recent call last): 2025-12-04T10:13:48.6372517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6372624Z getattr(self, test_name)() 2025-12-04T10:13:48.6373153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6373336Z fn() 2025-12-04T10:13:48.6374009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6374119Z method(*args, **kwargs) 2025-12-04T10:13:48.6374661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6374779Z method(*args, **kwargs) 2025-12-04T10:13:48.6375290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6375390Z with policy(): 2025-12-04T10:13:48.6375992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6376138Z raise RuntimeError(msg) 2025-12-04T10:13:48.6377384Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.6377394Z 2025-12-04T10:13:48.6377618Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6378315Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6378330Z 2025-12-04T10:13:48.6378797Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6378809Z 2025-12-04T10:13:48.6378814Z 2025-12-04T10:13:48.6379043Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6379315Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6380109Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-72a03384c8cb338e.xml - 2025-12-04T10:13:48.6380292Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6381221Z FAILED [10.2311s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6381342Z Traceback (most recent call last): 2025-12-04T10:13:48.6381910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6382019Z getattr(self, test_name)() 2025-12-04T10:13:48.6382556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6382661Z fn() 2025-12-04T10:13:48.6383166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6383281Z method(*args, **kwargs) 2025-12-04T10:13:48.6383781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6383884Z method(*args, **kwargs) 2025-12-04T10:13:48.6384397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6384493Z with policy(): 2025-12-04T10:13:48.6385048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6385159Z raise RuntimeError(msg) 2025-12-04T10:13:48.6386374Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.6386380Z 2025-12-04T10:13:48.6386646Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6387345Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6387351Z 2025-12-04T10:13:48.6387623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6387799Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6387975Z ====================== 1 failed, 24 deselected in 10.45s ======================= 2025-12-04T10:13:48.6388081Z Got exit code 1 2025-12-04T10:13:48.6388223Z Retrying single test... 2025-12-04T10:13:48.6388848Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-efc6da476f35386f.xml 2025-12-04T10:13:48.6389015Z ============================= test session starts ============================== 2025-12-04T10:13:48.6389362Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6389476Z cachedir: .pytest_cache 2025-12-04T10:13:48.6389993Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6390114Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6390229Z configfile: pytest.ini 2025-12-04T10:13:48.6390860Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6391072Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.6391803Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6391908Z Running 1 items in this shard 2025-12-04T10:13:48.6391914Z 2025-12-04T10:13:48.6392915Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 10:07:01.670000 91228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 91280 2025-12-04T10:13:48.6393413Z I1204 10:07:01.671000 91228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 91281 2025-12-04T10:13:48.6393889Z I1204 10:07:01.671000 91228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 91282 2025-12-04T10:13:48.6394368Z I1204 10:07:01.672000 91228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 91283 2025-12-04T10:13:48.6395313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6395449Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6397340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6397447Z _warn_cpu_init() 2025-12-04T10:13:48.6398404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6398536Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6399463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6399612Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6401496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6401632Z _warn_cpu_init() 2025-12-04T10:13:48.6403516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6403610Z _warn_cpu_init() 2025-12-04T10:13:48.6404550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6404763Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6405697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6405976Z return func(*args, **kwargs) 2025-12-04T10:13:48.6406935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6407161Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6408140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6408363Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6409312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6409441Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6411396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6411493Z _warn_cpu_init() 2025-12-04T10:13:48.6412488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6412700Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6413522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6413840Z return func(*args, **kwargs) 2025-12-04T10:13:48.6414608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6414739Z return func(*args, **kwargs) 2025-12-04T10:13:48.6415504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6415617Z return func(*args, **kwargs) 2025-12-04T10:13:48.6416383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6416522Z return func(*args, **kwargs) 2025-12-04T10:13:48.6417299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6417406Z return func(*args, **kwargs) 2025-12-04T10:13:48.6418167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6418285Z return func(*args, **kwargs) 2025-12-04T10:13:48.6419044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6419163Z return func(*args, **kwargs) 2025-12-04T10:13:48.6419918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6420030Z return func(*args, **kwargs) 2025-12-04T10:13:48.6420499Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6421033Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6422085Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6422594Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6423584Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6423989Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6424952Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6425453Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6426566Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6427006Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6427851Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6428268Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6429133Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6429569Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6431070Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.6431486Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6432076Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6433099Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6433430Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6434069Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6434548Z [rank1]:E1204 10:07:09.764000 91281 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6434984Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6435455Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6436351Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6436803Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6437671Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6438030Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6438881Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6439347Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6440198Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6440639Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6441511Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6441905Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6442762Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6443220Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6444708Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T10:13:48.6445032Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6445620Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6446634Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6446965Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6447605Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6448114Z [rank0]:E1204 10:07:09.764000 91280 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6448521Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6448995Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6449906Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6450359Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6451235Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6451592Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6452469Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6452914Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6454050Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6454599Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6455555Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6456002Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6456974Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6457494Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6459175Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.6459544Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6460210Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6461381Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6461747Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6462507Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6463051Z [rank2]:E1204 10:07:09.766000 91282 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6463515Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6464043Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6465052Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6465664Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6466536Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6466924Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6467775Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6468216Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6469098Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6469536Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6470382Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6470801Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6471655Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6472092Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6473577Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.6473899Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6474491Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6475530Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6475853Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6476501Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6476986Z [rank3]:E1204 10:07:09.773000 91283 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6477087Z dist init r=0, world=4 2025-12-04T10:13:48.6477175Z dist init r=2, world=4 2025-12-04T10:13:48.6477258Z dist init r=1, world=4 2025-12-04T10:13:48.6477356Z dist init r=3, world=4 2025-12-04T10:13:48.6478385Z [rank0]:[W1204 10:07:10.274873401 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6478485Z FAILED [10.2316s] [100%] 2025-12-04T10:13:48.6478490Z 2025-12-04T10:13:48.6478752Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6479244Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __ 2025-12-04T10:13:48.6479379Z Traceback (most recent call last): 2025-12-04T10:13:48.6479928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6480052Z self._join_processes(fn) 2025-12-04T10:13:48.6480635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6481299Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6481913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6482027Z raise RuntimeError(error) 2025-12-04T10:13:48.6482258Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6482384Z Traceback (most recent call last): 2025-12-04T10:13:48.6482963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6483135Z getattr(self, test_name)() 2025-12-04T10:13:48.6484138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6484300Z fn() 2025-12-04T10:13:48.6485230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6485412Z method(*args, **kwargs) 2025-12-04T10:13:48.6486320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6486512Z method(*args, **kwargs) 2025-12-04T10:13:48.6487427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6487622Z with policy(): 2025-12-04T10:13:48.6488570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6488758Z raise RuntimeError(msg) 2025-12-04T10:13:48.6490881Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.6490907Z 2025-12-04T10:13:48.6491381Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6492752Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6492767Z 2025-12-04T10:13:48.6493305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6493315Z 2025-12-04T10:13:48.6493324Z 2025-12-04T10:13:48.6493704Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6494363Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6495850Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-efc6da476f35386f.xml - 2025-12-04T10:13:48.6496182Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6497756Z FAILED [10.2316s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6497966Z Traceback (most recent call last): 2025-12-04T10:13:48.6498988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6499189Z getattr(self, test_name)() 2025-12-04T10:13:48.6500288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6500449Z fn() 2025-12-04T10:13:48.6501431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6501633Z method(*args, **kwargs) 2025-12-04T10:13:48.6502587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6502840Z method(*args, **kwargs) 2025-12-04T10:13:48.6503827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6504012Z with policy(): 2025-12-04T10:13:48.6505024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6505230Z raise RuntimeError(msg) 2025-12-04T10:13:48.6507577Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.6507673Z 2025-12-04T10:13:48.6508046Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6509282Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6509298Z 2025-12-04T10:13:48.6509753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6510058Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6510372Z ====================== 1 failed, 32 deselected in 10.45s ======================= 2025-12-04T10:13:48.6510539Z Got exit code 1 2025-12-04T10:13:48.6510709Z Retrying single test... 2025-12-04T10:13:48.6511814Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ecdb0e90ac1c2bc1.xml 2025-12-04T10:13:48.6512101Z ============================= test session starts ============================== 2025-12-04T10:13:48.6512724Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6512920Z cachedir: .pytest_cache 2025-12-04T10:13:48.6513849Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6514084Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6514371Z configfile: pytest.ini 2025-12-04T10:13:48.6515357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6515750Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.6517177Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6517372Z Running 1 items in this shard 2025-12-04T10:13:48.6517393Z 2025-12-04T10:13:48.6519373Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 10:07:16.580000 91565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 91617 2025-12-04T10:13:48.6520292Z I1204 10:07:16.580000 91565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 91618 2025-12-04T10:13:48.6521143Z I1204 10:07:16.581000 91565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 91619 2025-12-04T10:13:48.6522057Z I1204 10:07:16.582000 91565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 91620 2025-12-04T10:13:48.6523836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6524094Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6525983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6526127Z _warn_cpu_init() 2025-12-04T10:13:48.6527243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6527381Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6529359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6529473Z _warn_cpu_init() 2025-12-04T10:13:48.6530436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6530655Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6531620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6531747Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6532712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6532837Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T10:13:48.6535156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6535261Z _warn_cpu_init() 2025-12-04T10:13:48.6537266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6537382Z _warn_cpu_init() 2025-12-04T10:13:48.6538372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6538635Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6539626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6539757Z return func(*args, **kwargs) 2025-12-04T10:13:48.6540748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6541000Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6542005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T10:13:48.6542221Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T10:13:48.6542992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6543134Z return func(*args, **kwargs) 2025-12-04T10:13:48.6543899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6544018Z return func(*args, **kwargs) 2025-12-04T10:13:48.6544781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6544894Z return func(*args, **kwargs) 2025-12-04T10:13:48.6545778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T10:13:48.6545883Z return func(*args, **kwargs) 2025-12-04T10:13:48.6546627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6546730Z return func(*args, **kwargs) 2025-12-04T10:13:48.6547570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6547676Z return func(*args, **kwargs) 2025-12-04T10:13:48.6548417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6548550Z return func(*args, **kwargs) 2025-12-04T10:13:48.6549262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T10:13:48.6549373Z return func(*args, **kwargs) 2025-12-04T10:13:48.6549805Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6550312Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6551265Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6551747Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6552716Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6553093Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6553996Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6554496Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6555398Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6555862Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6556791Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6557221Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6558132Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6558592Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6560174Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.6560518Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6561151Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6562256Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6562611Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6563283Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6563798Z [rank1]:E1204 10:07:24.594000 91618 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6564233Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6564732Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6565677Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6566181Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6567118Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6567487Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6568415Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6568883Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6569782Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6570298Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6571188Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6571622Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6572532Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6572993Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6574879Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T10:13:48.6575250Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6575968Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6577121Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6577493Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6578206Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6578949Z [rank0]:E1204 10:07:24.594000 91617 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6579420Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6579950Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6581033Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6581541Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6582541Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6582985Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6583944Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6584440Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6585440Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6585938Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6586898Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6587352Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6588311Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6588806Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6590527Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.6590997Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6591646Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6592756Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6593120Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6593814Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6594348Z [rank3]:E1204 10:07:24.594000 91620 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6594798Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6595363Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6596343Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6596832Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6597831Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6598323Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6599227Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6599724Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6600623Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6601095Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6602004Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6602431Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6603514Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6603993Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6605664Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.6606018Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6606660Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6607775Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6608132Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6608827Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6609378Z [rank2]:E1204 10:07:24.595000 91619 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6609488Z dist init r=2, world=4 2025-12-04T10:13:48.6609587Z dist init r=3, world=4 2025-12-04T10:13:48.6609686Z dist init r=0, world=4 2025-12-04T10:13:48.6609775Z dist init r=1, world=4 2025-12-04T10:13:48.6610888Z [rank0]:[W1204 10:07:24.104928465 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6611026Z FAILED [10.1911s] [100%] 2025-12-04T10:13:48.6611033Z 2025-12-04T10:13:48.6611177Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6611493Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __ 2025-12-04T10:13:48.6611611Z Traceback (most recent call last): 2025-12-04T10:13:48.6612142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6612262Z self._join_processes(fn) 2025-12-04T10:13:48.6612856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6612995Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6613839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6613960Z raise RuntimeError(error) 2025-12-04T10:13:48.6614205Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.6614329Z Traceback (most recent call last): 2025-12-04T10:13:48.6614868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6614992Z getattr(self, test_name)() 2025-12-04T10:13:48.6615526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6615617Z fn() 2025-12-04T10:13:48.6616133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6616239Z method(*args, **kwargs) 2025-12-04T10:13:48.6616753Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6616859Z method(*args, **kwargs) 2025-12-04T10:13:48.6617392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6617501Z with policy(): 2025-12-04T10:13:48.6618007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6618115Z raise RuntimeError(msg) 2025-12-04T10:13:48.6619352Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T10:13:48.6619362Z 2025-12-04T10:13:48.6619575Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6620279Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6620287Z 2025-12-04T10:13:48.6620552Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6620557Z 2025-12-04T10:13:48.6620562Z 2025-12-04T10:13:48.6620792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6621050Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6621884Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ecdb0e90ac1c2bc1.xml - 2025-12-04T10:13:48.6622068Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6622934Z FAILED [10.1911s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.6623103Z Traceback (most recent call last): 2025-12-04T10:13:48.6623649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6623761Z getattr(self, test_name)() 2025-12-04T10:13:48.6624308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6624416Z fn() 2025-12-04T10:13:48.6625284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6625554Z method(*args, **kwargs) 2025-12-04T10:13:48.6626473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6626663Z method(*args, **kwargs) 2025-12-04T10:13:48.6627247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6627354Z with policy(): 2025-12-04T10:13:48.6627863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6627972Z raise RuntimeError(msg) 2025-12-04T10:13:48.6629172Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T10:13:48.6629182Z 2025-12-04T10:13:48.6629498Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6630149Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6630167Z 2025-12-04T10:13:48.6630419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6630588Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6630824Z ====================== 1 failed, 32 deselected in 10.41s ======================= 2025-12-04T10:13:48.6630919Z Got exit code 1 2025-12-04T10:13:48.6631498Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T10:13:48.6631892Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.6632473Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9524e9df873b8be0.xml 2025-12-04T10:13:48.6632640Z ============================= test session starts ============================== 2025-12-04T10:13:48.6632968Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6633072Z cachedir: .pytest_cache 2025-12-04T10:13:48.6633568Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6633685Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6633783Z configfile: pytest.ini 2025-12-04T10:13:48.6634298Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6634533Z collecting ... collected 60 items / 25 deselected / 35 selected 2025-12-04T10:13:48.6634678Z stepcurrent: skipping 25 already run items. 2025-12-04T10:13:48.6634785Z Running 8 items in this shard 2025-12-04T10:13:48.6634791Z 2025-12-04T10:13:48.6635774Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 10:07:31.530000 91902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 91954 2025-12-04T10:13:48.6636289Z I1204 10:07:31.531000 91902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 91955 2025-12-04T10:13:48.6636751Z I1204 10:07:31.531000 91902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 91956 2025-12-04T10:13:48.6637223Z I1204 10:07:31.532000 91902 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 91957 2025-12-04T10:13:48.6639315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6639460Z _warn_cpu_init() 2025-12-04T10:13:48.6641411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6641523Z _warn_cpu_init() 2025-12-04T10:13:48.6643471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6643568Z _warn_cpu_init() 2025-12-04T10:13:48.6645552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6645652Z _warn_cpu_init() 2025-12-04T10:13:48.6646624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6646735Z return func(*args, **kwargs) 2025-12-04T10:13:48.6647190Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6647716Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6648679Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6649234Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6650189Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6650582Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6651544Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6652024Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6652959Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6653565Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6654699Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6655147Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6656127Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6656617Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6658294Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.6658658Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6659348Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6660494Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6660862Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6661592Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6662138Z [rank0]:E1204 10:07:40.003000 91954 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6662601Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6663129Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6664157Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6664681Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6665797Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6666398Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6667307Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6667779Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6668884Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6669352Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6670293Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6670724Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6671662Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6672138Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6673773Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6674129Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6674765Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6675861Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6676211Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6676914Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6677452Z [rank3]:E1204 10:07:40.005000 91957 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6677894Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6678433Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6679742Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6680261Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6681333Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6681733Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6682690Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6683216Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6684186Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6684673Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6685638Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6686080Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6687043Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6687533Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6689258Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.6689683Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6690377Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6691618Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6691969Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6692687Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6693319Z [rank1]:E1204 10:07:40.005000 91955 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6693931Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6694474Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6695470Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6696023Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6697005Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6697406Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6698395Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6698874Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6699843Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6700328Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6701288Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6701732Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6702695Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6703209Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6704869Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.6705231Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6705990Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6707093Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6707440Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6708170Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6708697Z [rank2]:E1204 10:07:40.005000 91956 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6708795Z dist init r=0, world=4 2025-12-04T10:13:48.6708902Z dist init r=2, world=4 2025-12-04T10:13:48.6708994Z dist init r=3, world=4 2025-12-04T10:13:48.6709097Z dist init r=1, world=4 2025-12-04T10:13:48.6710252Z [rank0]:[W1204 10:07:40.509922501 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6710348Z FAILED [10.9862s] [ 12%] 2025-12-04T10:13:48.6710354Z 2025-12-04T10:13:48.6710505Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6710809Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____ 2025-12-04T10:13:48.6710932Z Traceback (most recent call last): 2025-12-04T10:13:48.6711459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6711599Z self._join_processes(fn) 2025-12-04T10:13:48.6712170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6712305Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6712891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6713006Z raise RuntimeError(error) 2025-12-04T10:13:48.6713236Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6713367Z Traceback (most recent call last): 2025-12-04T10:13:48.6713892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6714001Z getattr(self, test_name)() 2025-12-04T10:13:48.6714528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6714613Z fn() 2025-12-04T10:13:48.6715103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6715214Z method(*args, **kwargs) 2025-12-04T10:13:48.6715698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6715835Z method(*args, **kwargs) 2025-12-04T10:13:48.6716319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6716413Z with policy(): 2025-12-04T10:13:48.6716919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6717026Z raise RuntimeError(msg) 2025-12-04T10:13:48.6718217Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.6718226Z 2025-12-04T10:13:48.6718435Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6719094Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6719099Z 2025-12-04T10:13:48.6719368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6719373Z 2025-12-04T10:13:48.6719529Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.6719688Z Traceback (most recent call last): 2025-12-04T10:13:48.6720216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6720327Z getattr(self, test_name)() 2025-12-04T10:13:48.6720857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6720971Z fn() 2025-12-04T10:13:48.6721461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6721602Z method(*args, **kwargs) 2025-12-04T10:13:48.6722091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6722198Z method(*args, **kwargs) 2025-12-04T10:13:48.6722685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6722781Z with policy(): 2025-12-04T10:13:48.6723280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6723413Z raise RuntimeError(msg) 2025-12-04T10:13:48.6724587Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6724594Z 2025-12-04T10:13:48.6724801Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6725454Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6725460Z 2025-12-04T10:13:48.6725730Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6725736Z 2025-12-04T10:13:48.6725740Z 2025-12-04T10:13:48.6725954Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6726219Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6726996Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9524e9df873b8be0.xml - 2025-12-04T10:13:48.6727159Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6728017Z FAILED [10.9862s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.6728133Z Traceback (most recent call last): 2025-12-04T10:13:48.6728676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6728781Z getattr(self, test_name)() 2025-12-04T10:13:48.6729300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6729396Z fn() 2025-12-04T10:13:48.6729884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6729995Z method(*args, **kwargs) 2025-12-04T10:13:48.6730484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6730585Z method(*args, **kwargs) 2025-12-04T10:13:48.6731081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6731171Z with policy(): 2025-12-04T10:13:48.6731701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6731816Z raise RuntimeError(msg) 2025-12-04T10:13:48.6732988Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.6732997Z 2025-12-04T10:13:48.6733317Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6734154Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6734163Z 2025-12-04T10:13:48.6734436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6734442Z 2025-12-04T10:13:48.6734602Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.6734724Z Traceback (most recent call last): 2025-12-04T10:13:48.6735285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6735433Z getattr(self, test_name)() 2025-12-04T10:13:48.6735964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6736062Z fn() 2025-12-04T10:13:48.6736564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6736678Z method(*args, **kwargs) 2025-12-04T10:13:48.6737182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6737286Z method(*args, **kwargs) 2025-12-04T10:13:48.6737798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6737898Z with policy(): 2025-12-04T10:13:48.6738416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6738528Z raise RuntimeError(msg) 2025-12-04T10:13:48.6739728Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6739736Z 2025-12-04T10:13:48.6739957Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6740662Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6740668Z 2025-12-04T10:13:48.6740942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6741121Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6741297Z ====================== 1 failed, 25 deselected in 11.20s ======================= 2025-12-04T10:13:48.6741403Z Got exit code 1 2025-12-04T10:13:48.6741507Z Retrying single test... 2025-12-04T10:13:48.6742129Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ad9e0258dc223929.xml 2025-12-04T10:13:48.6742297Z ============================= test session starts ============================== 2025-12-04T10:13:48.6742637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6742754Z cachedir: .pytest_cache 2025-12-04T10:13:48.6743265Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6743384Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6743501Z configfile: pytest.ini 2025-12-04T10:13:48.6750658Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6751010Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.6751767Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6751917Z Running 1 items in this shard 2025-12-04T10:13:48.6751924Z 2025-12-04T10:13:48.6752949Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 10:07:47.030000 92239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 92291 2025-12-04T10:13:48.6753431Z I1204 10:07:47.031000 92239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 92292 2025-12-04T10:13:48.6753905Z I1204 10:07:47.031000 92239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 92293 2025-12-04T10:13:48.6754388Z I1204 10:07:47.032000 92239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 92294 2025-12-04T10:13:48.6756380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6756486Z _warn_cpu_init() 2025-12-04T10:13:48.6758458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6758562Z _warn_cpu_init() 2025-12-04T10:13:48.6759518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6759636Z return func(*args, **kwargs) 2025-12-04T10:13:48.6761602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6761695Z _warn_cpu_init() 2025-12-04T10:13:48.6763626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6763720Z _warn_cpu_init() 2025-12-04T10:13:48.6764178Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6764726Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6765702Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6766192Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6767175Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6767565Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6768502Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6768977Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6769930Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6770401Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6771330Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6771757Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6772688Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6773164Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6775130Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.6775495Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6776160Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6777292Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6777654Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6778370Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6779118Z [rank0]:E1204 10:07:55.488000 92291 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6779650Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6780178Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6781177Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6781724Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6782716Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6783120Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6784084Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6784612Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6785566Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6786053Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6787004Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6787446Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6788417Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6788901Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6790602Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6791055Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6791675Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6792732Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6793074Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6793747Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6794281Z [rank1]:E1204 10:07:55.488000 92292 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6794707Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6795199Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6796203Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6796675Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6797601Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6797975Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6799313Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6800176Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6801781Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6802627Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6804242Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6804954Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6806566Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6807474Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6810587Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.6811282Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6812443Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6814736Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6815385Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6816846Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6817879Z [rank2]:E1204 10:07:55.490000 92293 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6818748Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6819837Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6821755Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6822683Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6824486Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6825392Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6827112Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6827941Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6829553Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6830390Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6831988Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6832705Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6834488Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6835264Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6837688Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.6838040Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6838630Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6839701Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6840025Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6840669Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6841324Z [rank3]:E1204 10:07:55.490000 92294 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6841464Z dist init r=3, world=4 2025-12-04T10:13:48.6841558Z dist init r=0, world=4 2025-12-04T10:13:48.6841645Z dist init r=1, world=4 2025-12-04T10:13:48.6841741Z dist init r=2, world=4 2025-12-04T10:13:48.6842824Z [rank0]:[W1204 10:07:55.996753571 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6842928Z FAILED [11.0343s] [100%] 2025-12-04T10:13:48.6842935Z 2025-12-04T10:13:48.6843105Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6843393Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____ 2025-12-04T10:13:48.6843519Z Traceback (most recent call last): 2025-12-04T10:13:48.6844031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6844137Z self._join_processes(fn) 2025-12-04T10:13:48.6844698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6844828Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6845404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6845509Z raise RuntimeError(error) 2025-12-04T10:13:48.6845730Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.6845849Z Traceback (most recent call last): 2025-12-04T10:13:48.6846355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6846458Z getattr(self, test_name)() 2025-12-04T10:13:48.6846966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6847051Z fn() 2025-12-04T10:13:48.6847566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6847662Z method(*args, **kwargs) 2025-12-04T10:13:48.6848135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6848238Z method(*args, **kwargs) 2025-12-04T10:13:48.6848710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6848811Z with policy(): 2025-12-04T10:13:48.6849393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6849490Z raise RuntimeError(msg) 2025-12-04T10:13:48.6850571Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.6850576Z 2025-12-04T10:13:48.6850767Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6851401Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6851407Z 2025-12-04T10:13:48.6851637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6851645Z 2025-12-04T10:13:48.6851786Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6851899Z Traceback (most recent call last): 2025-12-04T10:13:48.6852385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6852518Z getattr(self, test_name)() 2025-12-04T10:13:48.6852989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6853068Z fn() 2025-12-04T10:13:48.6853792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6853899Z method(*args, **kwargs) 2025-12-04T10:13:48.6854402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6854591Z method(*args, **kwargs) 2025-12-04T10:13:48.6855087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6855189Z with policy(): 2025-12-04T10:13:48.6855693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6855800Z raise RuntimeError(msg) 2025-12-04T10:13:48.6857017Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6857024Z 2025-12-04T10:13:48.6857238Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6857917Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6857924Z 2025-12-04T10:13:48.6858182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6858188Z 2025-12-04T10:13:48.6858192Z 2025-12-04T10:13:48.6858406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6858677Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6859502Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ad9e0258dc223929.xml - 2025-12-04T10:13:48.6859676Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6860520Z FAILED [11.0343s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.6860637Z Traceback (most recent call last): 2025-12-04T10:13:48.6861193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6861300Z getattr(self, test_name)() 2025-12-04T10:13:48.6861839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6861927Z fn() 2025-12-04T10:13:48.6862432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6862545Z method(*args, **kwargs) 2025-12-04T10:13:48.6863043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6863154Z method(*args, **kwargs) 2025-12-04T10:13:48.6863678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6863775Z with policy(): 2025-12-04T10:13:48.6864287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6864395Z raise RuntimeError(msg) 2025-12-04T10:13:48.6865706Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T10:13:48.6865754Z 2025-12-04T10:13:48.6866069Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6866663Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6866670Z 2025-12-04T10:13:48.6866905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6866937Z 2025-12-04T10:13:48.6867081Z Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6867198Z Traceback (most recent call last): 2025-12-04T10:13:48.6867683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6867778Z getattr(self, test_name)() 2025-12-04T10:13:48.6868258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6868336Z fn() 2025-12-04T10:13:48.6868785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6868883Z method(*args, **kwargs) 2025-12-04T10:13:48.6869331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6869424Z method(*args, **kwargs) 2025-12-04T10:13:48.6869869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6869954Z with policy(): 2025-12-04T10:13:48.6870404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6870502Z raise RuntimeError(msg) 2025-12-04T10:13:48.6871589Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6871602Z 2025-12-04T10:13:48.6871789Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6872384Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6872391Z 2025-12-04T10:13:48.6872629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6872786Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6872948Z ====================== 1 failed, 32 deselected in 11.25s ======================= 2025-12-04T10:13:48.6873036Z Got exit code 1 2025-12-04T10:13:48.6873127Z Retrying single test... 2025-12-04T10:13:48.6873685Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-17a67113f8be5d53.xml 2025-12-04T10:13:48.6873826Z ============================= test session starts ============================== 2025-12-04T10:13:48.6874130Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6874730Z cachedir: .pytest_cache 2025-12-04T10:13:48.6875191Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6875313Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6875403Z configfile: pytest.ini 2025-12-04T10:13:48.6875874Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6876106Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.6876971Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6877075Z Running 1 items in this shard 2025-12-04T10:13:48.6877090Z 2025-12-04T10:13:48.6878061Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 10:08:02.479000 92576 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 92628 2025-12-04T10:13:48.6878524Z I1204 10:08:02.480000 92576 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 92629 2025-12-04T10:13:48.6879364Z I1204 10:08:02.481000 92576 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 92630 2025-12-04T10:13:48.6880026Z I1204 10:08:02.482000 92576 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 92631 2025-12-04T10:13:48.6882058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6882155Z _warn_cpu_init() 2025-12-04T10:13:48.6884167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6884268Z _warn_cpu_init() 2025-12-04T10:13:48.6886344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6886441Z _warn_cpu_init() 2025-12-04T10:13:48.6888438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6888544Z _warn_cpu_init() 2025-12-04T10:13:48.6889534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6889651Z return func(*args, **kwargs) 2025-12-04T10:13:48.6890153Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6890686Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6891779Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6892311Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6893332Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6893887Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6894893Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6895373Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6896327Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6896818Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6897770Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6898224Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6899180Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6899703Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6901364Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T10:13:48.6901731Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6902385Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6903519Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6903884Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6904687Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6905400Z [rank0]:E1204 10:08:10.990000 92628 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.6905942Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6906415Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6907361Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6907806Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6908685Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6909058Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6909911Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6910340Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6911183Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6911621Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6912467Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6912868Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6913745Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6914185Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6915657Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.6915983Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6916565Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6917559Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6917907Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6918538Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6919024Z [rank1]:E1204 10:08:10.991000 92629 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.6919448Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6919917Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6920809Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6921261Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6922163Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6922513Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6923381Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6923814Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6924661Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6925094Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6925938Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6926362Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6927217Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6927657Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6929687Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.6930232Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6931144Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6932275Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6932626Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6933391Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6934138Z [rank2]:E1204 10:08:10.992000 92630 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.6934589Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6935120Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6936128Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6936670Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6937662Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6938056Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6939029Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6939516Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6940477Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6940974Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.6941958Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6942408Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.6943365Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6943860Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.6945624Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6946077Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6946697Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6947696Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6948028Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.6948683Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6949172Z [rank3]:E1204 10:08:10.993000 92631 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.6949260Z dist init r=1, world=4 2025-12-04T10:13:48.6949348Z dist init r=3, world=4 2025-12-04T10:13:48.6949442Z dist init r=2, world=4 2025-12-04T10:13:48.6949527Z dist init r=0, world=4 2025-12-04T10:13:48.6950548Z [rank0]:[W1204 10:08:11.501786057 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.6950671Z FAILED [10.9888s] [100%] 2025-12-04T10:13:48.6950678Z 2025-12-04T10:13:48.6950807Z =================================== FAILURES =================================== 2025-12-04T10:13:48.6951089Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____ 2025-12-04T10:13:48.6951198Z Traceback (most recent call last): 2025-12-04T10:13:48.6951682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.6951782Z self._join_processes(fn) 2025-12-04T10:13:48.6952299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.6952430Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.6952965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.6953061Z raise RuntimeError(error) 2025-12-04T10:13:48.6953274Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6953383Z Traceback (most recent call last): 2025-12-04T10:13:48.6953859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6953992Z getattr(self, test_name)() 2025-12-04T10:13:48.6954461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6954548Z fn() 2025-12-04T10:13:48.6954995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6955086Z method(*args, **kwargs) 2025-12-04T10:13:48.6955537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6955627Z method(*args, **kwargs) 2025-12-04T10:13:48.6956064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6956158Z with policy(): 2025-12-04T10:13:48.6956604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6956708Z raise RuntimeError(msg) 2025-12-04T10:13:48.6957820Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.6957827Z 2025-12-04T10:13:48.6958014Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6958617Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6958623Z 2025-12-04T10:13:48.6958852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6958887Z 2025-12-04T10:13:48.6959039Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.6959145Z Traceback (most recent call last): 2025-12-04T10:13:48.6959629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6959733Z getattr(self, test_name)() 2025-12-04T10:13:48.6960204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6960284Z fn() 2025-12-04T10:13:48.6960726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6960846Z method(*args, **kwargs) 2025-12-04T10:13:48.6961296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6961386Z method(*args, **kwargs) 2025-12-04T10:13:48.6961838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6961921Z with policy(): 2025-12-04T10:13:48.6962371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6962473Z raise RuntimeError(msg) 2025-12-04T10:13:48.6963533Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6963541Z 2025-12-04T10:13:48.6963736Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6964331Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6964338Z 2025-12-04T10:13:48.6964568Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6964573Z 2025-12-04T10:13:48.6964577Z 2025-12-04T10:13:48.6964804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.6965035Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.6965745Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-17a67113f8be5d53.xml - 2025-12-04T10:13:48.6965894Z =========================== short test summary info ============================ 2025-12-04T10:13:48.6966653Z FAILED [10.9888s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.6966769Z Traceback (most recent call last): 2025-12-04T10:13:48.6967255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6967361Z getattr(self, test_name)() 2025-12-04T10:13:48.6967836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6967917Z fn() 2025-12-04T10:13:48.6968400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6968493Z method(*args, **kwargs) 2025-12-04T10:13:48.6968943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6969045Z method(*args, **kwargs) 2025-12-04T10:13:48.6969491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6969612Z with policy(): 2025-12-04T10:13:48.6970061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6970159Z raise RuntimeError(msg) 2025-12-04T10:13:48.6971239Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.6971246Z 2025-12-04T10:13:48.6971437Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6972072Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6972077Z 2025-12-04T10:13:48.6972309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6972315Z 2025-12-04T10:13:48.6972461Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.6972574Z Traceback (most recent call last): 2025-12-04T10:13:48.6973060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6973165Z getattr(self, test_name)() 2025-12-04T10:13:48.6973890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6973983Z fn() 2025-12-04T10:13:48.6974499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6974608Z method(*args, **kwargs) 2025-12-04T10:13:48.6975115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.6975218Z method(*args, **kwargs) 2025-12-04T10:13:48.6975720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.6975823Z with policy(): 2025-12-04T10:13:48.6976367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.6976478Z raise RuntimeError(msg) 2025-12-04T10:13:48.6977692Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.6977700Z 2025-12-04T10:13:48.6977912Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.6978810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6978820Z 2025-12-04T10:13:48.6979088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.6979275Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.6979455Z ====================== 1 failed, 32 deselected in 11.21s ======================= 2025-12-04T10:13:48.6979554Z Got exit code 1 2025-12-04T10:13:48.6980160Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T10:13:48.6980638Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.6981261Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c0681c30ea8c1a74.xml 2025-12-04T10:13:48.6981431Z ============================= test session starts ============================== 2025-12-04T10:13:48.6981776Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.6981930Z cachedir: .pytest_cache 2025-12-04T10:13:48.6982441Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.6982565Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.6982682Z configfile: pytest.ini 2025-12-04T10:13:48.6983211Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.6983426Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T10:13:48.6983570Z stepcurrent: skipping 26 already run items. 2025-12-04T10:13:48.6983721Z Running 7 items in this shard 2025-12-04T10:13:48.6983727Z 2025-12-04T10:13:48.6984824Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda I1204 10:08:17.990000 92913 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 92965 2025-12-04T10:13:48.6985323Z I1204 10:08:17.990000 92913 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 92966 2025-12-04T10:13:48.6985814Z I1204 10:08:17.991000 92913 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 92967 2025-12-04T10:13:48.6986305Z I1204 10:08:17.992000 92913 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 92968 2025-12-04T10:13:48.6988327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6988438Z _warn_cpu_init() 2025-12-04T10:13:48.6990580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6990687Z _warn_cpu_init() 2025-12-04T10:13:48.6992558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6992655Z _warn_cpu_init() 2025-12-04T10:13:48.6994439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.6994532Z _warn_cpu_init() 2025-12-04T10:13:48.6995411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.6995511Z return func(*args, **kwargs) 2025-12-04T10:13:48.6995955Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.6996428Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.6997316Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.6997766Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.6998684Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.6999034Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.6999889Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7000327Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7001171Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7001608Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7002459Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7002893Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7003755Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7004185Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7005700Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T10:13:48.7006023Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7006615Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7007668Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7007999Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7008632Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7009141Z [rank0]:E1204 10:08:26.369000 92965 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.7009554Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7010023Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7010919Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7011390Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7012262Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7012625Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7013536Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7014195Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7015150Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7015645Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7016638Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7017084Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7018059Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7018545Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7020250Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.7020612Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7021306Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7022476Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7022847Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7023593Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7024132Z [rank2]:E1204 10:08:26.371000 92967 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.7024589Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7025117Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7026230Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7026681Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7027555Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7027912Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7028757Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7029196Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7030034Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7030496Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7031345Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7031731Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7032589Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7033020Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7034539Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.7034883Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7035477Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7036510Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7036865Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7037497Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7037980Z [rank1]:E1204 10:08:26.372000 92966 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.7038415Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7038883Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7039776Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7040225Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7041104Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7041466Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7042309Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7042748Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7043621Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7044064Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7044907Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7045297Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7046158Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7046586Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7048115Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7048437Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7049022Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7050085Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7050412Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7051043Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7051566Z [rank3]:E1204 10:08:26.372000 92968 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.7051663Z dist init r=0, world=4 2025-12-04T10:13:48.7051751Z dist init r=1, world=4 2025-12-04T10:13:48.7051837Z dist init r=2, world=4 2025-12-04T10:13:48.7051933Z dist init r=3, world=4 2025-12-04T10:13:48.7052956Z [rank0]:[W1204 10:08:26.877117318 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.7053056Z FAILED [10.7380s] [ 14%] 2025-12-04T10:13:48.7053061Z 2025-12-04T10:13:48.7053269Z =================================== FAILURES =================================== 2025-12-04T10:13:48.7053578Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.7053868Z Traceback (most recent call last): 2025-12-04T10:13:48.7054414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.7054575Z self._join_processes(fn) 2025-12-04T10:13:48.7055156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.7055300Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.7055949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.7056061Z raise RuntimeError(error) 2025-12-04T10:13:48.7056298Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7056426Z Traceback (most recent call last): 2025-12-04T10:13:48.7056961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7057079Z getattr(self, test_name)() 2025-12-04T10:13:48.7057609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7057696Z fn() 2025-12-04T10:13:48.7058206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7058313Z method(*args, **kwargs) 2025-12-04T10:13:48.7058810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7058918Z method(*args, **kwargs) 2025-12-04T10:13:48.7059416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7059549Z with policy(): 2025-12-04T10:13:48.7060057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7060167Z raise RuntimeError(msg) 2025-12-04T10:13:48.7061413Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.7061450Z 2025-12-04T10:13:48.7061666Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7062395Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7062401Z 2025-12-04T10:13:48.7062665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7062673Z 2025-12-04T10:13:48.7062678Z 2025-12-04T10:13:48.7062904Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.7063194Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.7063992Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c0681c30ea8c1a74.xml - 2025-12-04T10:13:48.7064175Z =========================== short test summary info ============================ 2025-12-04T10:13:48.7065059Z FAILED [10.7380s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7065192Z Traceback (most recent call last): 2025-12-04T10:13:48.7065742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7065962Z getattr(self, test_name)() 2025-12-04T10:13:48.7066471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7066557Z fn() 2025-12-04T10:13:48.7067025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7067134Z method(*args, **kwargs) 2025-12-04T10:13:48.7067608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7067715Z method(*args, **kwargs) 2025-12-04T10:13:48.7068216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7068309Z with policy(): 2025-12-04T10:13:48.7068796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7068894Z raise RuntimeError(msg) 2025-12-04T10:13:48.7070061Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.7070069Z 2025-12-04T10:13:48.7070272Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7070944Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7070952Z 2025-12-04T10:13:48.7071204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7071369Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.7071575Z ====================== 1 failed, 26 deselected in 10.96s ======================= 2025-12-04T10:13:48.7071669Z Got exit code 1 2025-12-04T10:13:48.7071764Z Retrying single test... 2025-12-04T10:13:48.7072363Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-27fc2bea2cad5f2f.xml 2025-12-04T10:13:48.7072513Z ============================= test session starts ============================== 2025-12-04T10:13:48.7072868Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.7072977Z cachedir: .pytest_cache 2025-12-04T10:13:48.7073462Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.7073588Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.7073686Z configfile: pytest.ini 2025-12-04T10:13:48.7074191Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.7074399Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.7075171Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7075285Z Running 1 items in this shard 2025-12-04T10:13:48.7075291Z 2025-12-04T10:13:48.7076297Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda I1204 10:08:33.429000 93250 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 93302 2025-12-04T10:13:48.7076769Z I1204 10:08:33.430000 93250 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 93303 2025-12-04T10:13:48.7077236Z I1204 10:08:33.431000 93250 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 93304 2025-12-04T10:13:48.7077693Z I1204 10:08:33.432000 93250 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 93305 2025-12-04T10:13:48.7079956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7080058Z _warn_cpu_init() 2025-12-04T10:13:48.7082135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7082233Z _warn_cpu_init() 2025-12-04T10:13:48.7084235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7084333Z _warn_cpu_init() 2025-12-04T10:13:48.7086382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7086526Z _warn_cpu_init() 2025-12-04T10:13:48.7087537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.7087719Z return func(*args, **kwargs) 2025-12-04T10:13:48.7088209Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7088752Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7089746Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7090294Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7091396Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7091781Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7092721Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7093248Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7094376Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7094860Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7095861Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7096304Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7097261Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7097757Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7099444Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7099820Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7100506Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7101678Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7102042Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7102809Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7103356Z [rank1]:E1204 10:08:41.889000 93303 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.7103807Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7104348Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7105466Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7106042Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7106919Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7107270Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7108128Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7108560Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7109417Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7109944Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7110798Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7111192Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7112048Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7112492Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7113992Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.7114355Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7115205Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7117102Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7117757Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7118925Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7119817Z [rank3]:E1204 10:08:41.889000 93305 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.7120481Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7121400Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7122990Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7123810Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7125868Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7126627Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7128328Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7129202Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7131013Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7131913Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7133979Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7134858Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7136732Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7137639Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7140853Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7141531Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7142735Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7145057Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7145960Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7147228Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7148219Z [rank2]:E1204 10:08:41.890000 93304 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.7148967Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7149913Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7151519Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7152402Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7153814Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7154200Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7155118Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7155654Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7156571Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7157021Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7157935Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7158351Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7159251Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7159719Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7161341Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240. 2025-12-04T10:13:48.7161694Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7162349Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7163461Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7163802Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7164504Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7165025Z [rank0]:E1204 10:08:41.894000 93302 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.7165121Z dist init r=2, world=4 2025-12-04T10:13:48.7165225Z dist init r=3, world=4 2025-12-04T10:13:48.7165316Z dist init r=1, world=4 2025-12-04T10:13:48.7165405Z dist init r=0, world=4 2025-12-04T10:13:48.7166507Z [rank0]:[W1204 10:08:42.420995579 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.7166606Z FAILED [10.9188s] [100%] 2025-12-04T10:13:48.7166614Z 2025-12-04T10:13:48.7166761Z =================================== FAILURES =================================== 2025-12-04T10:13:48.7167250Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.7167367Z Traceback (most recent call last): 2025-12-04T10:13:48.7167900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.7168008Z self._join_processes(fn) 2025-12-04T10:13:48.7168604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.7168750Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.7169437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.7169552Z raise RuntimeError(error) 2025-12-04T10:13:48.7169772Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.7169887Z Traceback (most recent call last): 2025-12-04T10:13:48.7170397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7170499Z getattr(self, test_name)() 2025-12-04T10:13:48.7171005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7171092Z fn() 2025-12-04T10:13:48.7171566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7171670Z method(*args, **kwargs) 2025-12-04T10:13:48.7172140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7172233Z method(*args, **kwargs) 2025-12-04T10:13:48.7172766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7172860Z with policy(): 2025-12-04T10:13:48.7173490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7173594Z raise RuntimeError(msg) 2025-12-04T10:13:48.7175036Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7175044Z 2025-12-04T10:13:48.7175271Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7175984Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7175990Z 2025-12-04T10:13:48.7176261Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7176298Z 2025-12-04T10:13:48.7176303Z 2025-12-04T10:13:48.7176524Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.7176783Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.7177600Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-27fc2bea2cad5f2f.xml - 2025-12-04T10:13:48.7177769Z =========================== short test summary info ============================ 2025-12-04T10:13:48.7178850Z FAILED [10.9188s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.7178982Z Traceback (most recent call last): 2025-12-04T10:13:48.7179532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7179654Z getattr(self, test_name)() 2025-12-04T10:13:48.7180189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7180291Z fn() 2025-12-04T10:13:48.7180800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7180905Z method(*args, **kwargs) 2025-12-04T10:13:48.7181498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7181605Z method(*args, **kwargs) 2025-12-04T10:13:48.7182119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7182218Z with policy(): 2025-12-04T10:13:48.7182727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7182854Z raise RuntimeError(msg) 2025-12-04T10:13:48.7184099Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7184108Z 2025-12-04T10:13:48.7184333Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7185053Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7185059Z 2025-12-04T10:13:48.7185324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7185559Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.7185743Z ====================== 1 failed, 32 deselected in 11.14s ======================= 2025-12-04T10:13:48.7185836Z Got exit code 1 2025-12-04T10:13:48.7185950Z Retrying single test... 2025-12-04T10:13:48.7186578Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-834db467a2bb808c.xml 2025-12-04T10:13:48.7186789Z ============================= test session starts ============================== 2025-12-04T10:13:48.7187138Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.7187244Z cachedir: .pytest_cache 2025-12-04T10:13:48.7187769Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.7187894Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.7188000Z configfile: pytest.ini 2025-12-04T10:13:48.7188548Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.7188807Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.7189620Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7189736Z Running 1 items in this shard 2025-12-04T10:13:48.7189742Z 2025-12-04T10:13:48.7191011Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda I1204 10:08:48.779000 93587 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 93639 2025-12-04T10:13:48.7191456Z I1204 10:08:48.780000 93587 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 93640 2025-12-04T10:13:48.7191892Z I1204 10:08:48.781000 93587 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 93641 2025-12-04T10:13:48.7192337Z I1204 10:08:48.782000 93587 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 93642 2025-12-04T10:13:48.7194155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7194254Z _warn_cpu_init() 2025-12-04T10:13:48.7196024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7196123Z _warn_cpu_init() 2025-12-04T10:13:48.7197895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7197996Z _warn_cpu_init() 2025-12-04T10:13:48.7200011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7200131Z _warn_cpu_init() 2025-12-04T10:13:48.7201080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.7201185Z return func(*args, **kwargs) 2025-12-04T10:13:48.7201627Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7202128Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7203071Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7203579Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7204518Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7204904Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7205801Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7206272Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7207167Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7207622Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7208561Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7208981Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7209903Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7210364Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7212019Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T10:13:48.7212368Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7212958Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7214289Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7214697Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7215429Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7215978Z [rank0]:E1204 10:08:57.134000 93639 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.7216439Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7217001Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7218000Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7218522Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7219509Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7219917Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7220881Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7221375Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7222375Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7222855Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7223821Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7224264Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7225229Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7225721Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7227379Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T10:13:48.7227707Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7228295Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7229357Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7229677Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7230323Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7230804Z [rank1]:E1204 10:08:57.134000 93640 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.7231238Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7231704Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7232590Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7233051Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7234135Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7234682Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7235611Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7236590Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7237561Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7238038Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7238985Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7239414Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7240361Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7240839Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7242522Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7242881Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7243559Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7244698Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7245049Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7245748Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7246308Z [rank2]:E1204 10:08:57.135000 93641 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.7246753Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7247273Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7248248Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7248744Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7249702Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7250101Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7251057Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7251542Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7252474Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7252956Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7254142Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7254590Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7255560Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7256103Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7258824Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T10:13:48.7259290Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7259966Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7261141Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7261540Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7262269Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7262810Z [rank3]:E1204 10:08:57.136000 93642 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.7262924Z dist init r=0, world=4 2025-12-04T10:13:48.7263022Z dist init r=2, world=4 2025-12-04T10:13:48.7263119Z dist init r=3, world=4 2025-12-04T10:13:48.7263229Z dist init r=1, world=4 2025-12-04T10:13:48.7264399Z [rank0]:[W1204 10:08:57.643635495 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.7264510Z FAILED [9.8861s] [100%] 2025-12-04T10:13:48.7264517Z 2025-12-04T10:13:48.7264661Z =================================== FAILURES =================================== 2025-12-04T10:13:48.7264998Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.7265133Z Traceback (most recent call last): 2025-12-04T10:13:48.7265786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.7265928Z self._join_processes(fn) 2025-12-04T10:13:48.7266506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.7266646Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.7267237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.7267349Z raise RuntimeError(error) 2025-12-04T10:13:48.7267578Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7267704Z Traceback (most recent call last): 2025-12-04T10:13:48.7268224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7268331Z getattr(self, test_name)() 2025-12-04T10:13:48.7268858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7268944Z fn() 2025-12-04T10:13:48.7269444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7269544Z method(*args, **kwargs) 2025-12-04T10:13:48.7270057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7270167Z method(*args, **kwargs) 2025-12-04T10:13:48.7270650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7270751Z with policy(): 2025-12-04T10:13:48.7271244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7271375Z raise RuntimeError(msg) 2025-12-04T10:13:48.7272588Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7272595Z 2025-12-04T10:13:48.7272805Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7273514Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7273567Z 2025-12-04T10:13:48.7273822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7273827Z 2025-12-04T10:13:48.7273832Z 2025-12-04T10:13:48.7274047Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.7274314Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.7275089Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-834db467a2bb808c.xml - 2025-12-04T10:13:48.7275262Z =========================== short test summary info ============================ 2025-12-04T10:13:48.7276120Z FAILED [9.8861s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7276237Z Traceback (most recent call last): 2025-12-04T10:13:48.7276780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7276889Z getattr(self, test_name)() 2025-12-04T10:13:48.7277419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7277507Z fn() 2025-12-04T10:13:48.7278025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7278142Z method(*args, **kwargs) 2025-12-04T10:13:48.7278802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7278915Z method(*args, **kwargs) 2025-12-04T10:13:48.7279592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7279695Z with policy(): 2025-12-04T10:13:48.7280213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7280324Z raise RuntimeError(msg) 2025-12-04T10:13:48.7281575Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T10:13:48.7281593Z 2025-12-04T10:13:48.7281804Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7282598Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7282605Z 2025-12-04T10:13:48.7282879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7283058Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.7283236Z ====================== 1 failed, 32 deselected in 10.10s ======================= 2025-12-04T10:13:48.7283345Z Got exit code 1 2025-12-04T10:13:48.7284020Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.7284437Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.7285059Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48f6c290350ef90.xml 2025-12-04T10:13:48.7285220Z ============================= test session starts ============================== 2025-12-04T10:13:48.7285580Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.7285733Z cachedir: .pytest_cache 2025-12-04T10:13:48.7286261Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.7286383Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.7286488Z configfile: pytest.ini 2025-12-04T10:13:48.7287037Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.7287249Z collecting ... collected 60 items / 27 deselected / 33 selected 2025-12-04T10:13:48.7287395Z stepcurrent: skipping 27 already run items. 2025-12-04T10:13:48.7287511Z Running 6 items in this shard 2025-12-04T10:13:48.7287517Z 2025-12-04T10:13:48.7288763Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda I1204 10:09:03.730000 93924 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 93976 2025-12-04T10:13:48.7289280Z I1204 10:09:03.731000 93924 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 93977 2025-12-04T10:13:48.7289770Z I1204 10:09:03.732000 93924 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 93978 2025-12-04T10:13:48.7290275Z I1204 10:09:03.732000 93924 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 93979 2025-12-04T10:13:48.7292536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7292634Z _warn_cpu_init() 2025-12-04T10:13:48.7294823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7294927Z _warn_cpu_init() 2025-12-04T10:13:48.7296976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7297077Z _warn_cpu_init() 2025-12-04T10:13:48.7299093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7299221Z _warn_cpu_init() 2025-12-04T10:13:48.7300223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.7300335Z return func(*args, **kwargs) 2025-12-04T10:13:48.7300793Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7301369Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7302369Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7302890Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7303883Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7304287Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7305250Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7305839Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7306888Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7307346Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7308254Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7308672Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7309590Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7310228Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7312068Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T10:13:48.7312440Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7313078Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7314405Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7314763Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7315462Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7316021Z [rank0]:E1204 10:09:11.412000 93976 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.7316456Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7316984Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7317957Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7318458Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7319414Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7319809Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7320766Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7321238Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7322180Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7322650Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7323583Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7324013Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7324951Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7325452Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7327259Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.7327667Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7328303Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7329616Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7329999Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7330705Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7331236Z [rank1]:E1204 10:09:11.413000 93977 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.7331674Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7332202Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7333173Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7333954Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7334952Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7335398Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7336362Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7336849Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7337821Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7338304Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7339275Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7339720Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7340724Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7341216Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7343075Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.7343477Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7344134Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7345688Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7346195Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7346905Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7348275Z [rank3]:E1204 10:09:11.414000 93979 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.7349377Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7350472Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7352099Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7353714Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7355362Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7356836Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7358286Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7359823Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7361368Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7362886Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7364456Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7365950Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7367455Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7369036Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7371533Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.7374077Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7375244Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7377389Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7379438Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7380668Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7382054Z [rank2]:E1204 10:09:11.420000 93978 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.7382849Z dist init r=2, world=4 2025-12-04T10:13:48.7383134Z dist init r=1, world=4 2025-12-04T10:13:48.7383408Z dist init r=0, world=4 2025-12-04T10:13:48.7383668Z dist init r=3, world=4 2025-12-04T10:13:48.7385078Z [rank0]:[W1204 10:09:11.933292840 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.7386462Z FAILED [9.6711s] [ 16%] 2025-12-04T10:13:48.7386638Z 2025-12-04T10:13:48.7386789Z =================================== FAILURES =================================== 2025-12-04T10:13:48.7387584Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.7388353Z Traceback (most recent call last): 2025-12-04T10:13:48.7389145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.7389929Z self._join_processes(fn) 2025-12-04T10:13:48.7390908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.7391686Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.7392467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.7393225Z raise RuntimeError(error) 2025-12-04T10:13:48.7393629Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7394061Z Traceback (most recent call last): 2025-12-04T10:13:48.7394780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7395487Z getattr(self, test_name)() 2025-12-04T10:13:48.7396151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7396838Z fn() 2025-12-04T10:13:48.7397399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7398101Z method(*args, **kwargs) 2025-12-04T10:13:48.7398732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7399396Z method(*args, **kwargs) 2025-12-04T10:13:48.7400019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7400684Z with policy(): 2025-12-04T10:13:48.7401288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7401989Z raise RuntimeError(msg) 2025-12-04T10:13:48.7403426Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.7404797Z 2025-12-04T10:13:48.7404987Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7406088Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7406992Z 2025-12-04T10:13:48.7407232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7407602Z 2025-12-04T10:13:48.7407606Z 2025-12-04T10:13:48.7407804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.7408355Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.7409419Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48f6c290350ef90.xml - 2025-12-04T10:13:48.7410401Z =========================== short test summary info ============================ 2025-12-04T10:13:48.7411621Z FAILED [9.6711s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7412775Z Traceback (most recent call last): 2025-12-04T10:13:48.7413547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7414587Z getattr(self, test_name)() 2025-12-04T10:13:48.7415337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7416103Z fn() 2025-12-04T10:13:48.7416747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7417491Z method(*args, **kwargs) 2025-12-04T10:13:48.7418204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7418952Z method(*args, **kwargs) 2025-12-04T10:13:48.7419654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7420381Z with policy(): 2025-12-04T10:13:48.7421111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7421874Z raise RuntimeError(msg) 2025-12-04T10:13:48.7423478Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.7425050Z 2025-12-04T10:13:48.7425298Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7427473Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7429208Z 2025-12-04T10:13:48.7429687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7430777Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.7431483Z ======================= 1 failed, 27 deselected in 9.89s ======================= 2025-12-04T10:13:48.7432211Z Got exit code 1 2025-12-04T10:13:48.7432601Z Retrying single test... 2025-12-04T10:13:48.7433857Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1ee2f71fb8de6413.xml 2025-12-04T10:13:48.7435468Z ============================= test session starts ============================== 2025-12-04T10:13:48.7436764Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.7437846Z cachedir: .pytest_cache 2025-12-04T10:13:48.7439082Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.7440404Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.7440998Z configfile: pytest.ini 2025-12-04T10:13:48.7442231Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.7443843Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.7446334Z stepcurrent: skipping 27 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7448565Z Running 1 items in this shard 2025-12-04T10:13:48.7448934Z 2025-12-04T10:13:48.7451263Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda I1204 10:09:18.150000 94261 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 94313 2025-12-04T10:13:48.7454963Z I1204 10:09:18.151000 94261 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 94314 2025-12-04T10:13:48.7457132Z I1204 10:09:18.151000 94261 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 94315 2025-12-04T10:13:48.7459229Z I1204 10:09:18.152000 94261 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 94316 2025-12-04T10:13:48.7464249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7466840Z _warn_cpu_init() 2025-12-04T10:13:48.7469064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7471178Z _warn_cpu_init() 2025-12-04T10:13:48.7473224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7475359Z _warn_cpu_init() 2025-12-04T10:13:48.7477444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7480037Z _warn_cpu_init() 2025-12-04T10:13:48.7481192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.7482441Z return func(*args, **kwargs) 2025-12-04T10:13:48.7483125Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7484266Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7485933Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7487586Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7489303Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7490957Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7492466Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7494395Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7495994Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7497585Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7499181Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7500783Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7502330Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7503942Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7506628Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.7508726Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7509766Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7511680Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7513313Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7514398Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7515641Z [rank1]:E1204 10:09:25.796000 94314 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.7516659Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7517649Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7519138Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7520628Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7522082Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7523421Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7524750Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7526154Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7527566Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7528976Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7530399Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7531770Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7533144Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7534945Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7537456Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T10:13:48.7539848Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7540997Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7543146Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7544969Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7546362Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7547600Z [rank0]:E1204 10:09:25.796000 94313 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.7548610Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7549608Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7551132Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7552590Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7554028Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7555379Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7557473Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7559293Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7560869Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7562364Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7563862Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7565427Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7566886Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7568409Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7570625Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.7572742Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7574073Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7576214Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7578047Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7579471Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7580885Z [rank2]:E1204 10:09:25.797000 94315 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.7582099Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7583228Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7584934Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7586585Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7588217Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7589743Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7591469Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7592997Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7594404Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7595851Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7597261Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7598632Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7599996Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7601444Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7603651Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.7605739Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7606771Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7608660Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7610295Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7611416Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7612661Z [rank3]:E1204 10:09:25.798000 94316 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.7613428Z dist init r=0, world=4 2025-12-04T10:13:48.7613850Z dist init r=1, world=4 2025-12-04T10:13:48.7614123Z dist init r=3, world=4 2025-12-04T10:13:48.7614397Z dist init r=2, world=4 2025-12-04T10:13:48.7615783Z [rank0]:[W1204 10:09:26.305540414 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.7617160Z FAILED [9.8973s] [100%] 2025-12-04T10:13:48.7617356Z 2025-12-04T10:13:48.7617505Z =================================== FAILURES =================================== 2025-12-04T10:13:48.7618296Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.7619045Z Traceback (most recent call last): 2025-12-04T10:13:48.7619823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.7620651Z self._join_processes(fn) 2025-12-04T10:13:48.7621441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.7622294Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.7623177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.7624064Z raise RuntimeError(error) 2025-12-04T10:13:48.7624496Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.7624995Z Traceback (most recent call last): 2025-12-04T10:13:48.7625768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7626599Z getattr(self, test_name)() 2025-12-04T10:13:48.7627247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7627925Z fn() 2025-12-04T10:13:48.7628493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7629184Z method(*args, **kwargs) 2025-12-04T10:13:48.7629805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7630469Z method(*args, **kwargs) 2025-12-04T10:13:48.7631086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7631731Z with policy(): 2025-12-04T10:13:48.7632329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7633002Z raise RuntimeError(msg) 2025-12-04T10:13:48.7634423Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.7635796Z 2025-12-04T10:13:48.7635985Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7637078Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7637986Z 2025-12-04T10:13:48.7638248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7638604Z 2025-12-04T10:13:48.7638754Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.7639114Z Traceback (most recent call last): 2025-12-04T10:13:48.7639805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7640510Z getattr(self, test_name)() 2025-12-04T10:13:48.7641176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7641841Z fn() 2025-12-04T10:13:48.7642405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7643081Z method(*args, **kwargs) 2025-12-04T10:13:48.7643692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7644354Z method(*args, **kwargs) 2025-12-04T10:13:48.7644970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7645629Z with policy(): 2025-12-04T10:13:48.7646241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7646907Z raise RuntimeError(msg) 2025-12-04T10:13:48.7648332Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.7649715Z 2025-12-04T10:13:48.7649915Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7651002Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7651907Z 2025-12-04T10:13:48.7652140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7652501Z 2025-12-04T10:13:48.7652505Z 2025-12-04T10:13:48.7652700Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.7653395Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.7654726Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1ee2f71fb8de6413.xml - 2025-12-04T10:13:48.7655829Z =========================== short test summary info ============================ 2025-12-04T10:13:48.7657189Z FAILED [9.8973s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.7658483Z Traceback (most recent call last): 2025-12-04T10:13:48.7659260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7660058Z getattr(self, test_name)() 2025-12-04T10:13:48.7660804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7661572Z fn() 2025-12-04T10:13:48.7662198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7662947Z method(*args, **kwargs) 2025-12-04T10:13:48.7663649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7664395Z method(*args, **kwargs) 2025-12-04T10:13:48.7665122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7666064Z with policy(): 2025-12-04T10:13:48.7666656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7667318Z raise RuntimeError(msg) 2025-12-04T10:13:48.7668741Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.7670111Z 2025-12-04T10:13:48.7670306Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7671402Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7672298Z 2025-12-04T10:13:48.7672539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7672895Z 2025-12-04T10:13:48.7673036Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.7673435Z Traceback (most recent call last): 2025-12-04T10:13:48.7674123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7674815Z getattr(self, test_name)() 2025-12-04T10:13:48.7675476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7676152Z fn() 2025-12-04T10:13:48.7676747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7677394Z method(*args, **kwargs) 2025-12-04T10:13:48.7678012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7678839Z method(*args, **kwargs) 2025-12-04T10:13:48.7679680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7680424Z with policy(): 2025-12-04T10:13:48.7681093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7681914Z raise RuntimeError(msg) 2025-12-04T10:13:48.7683507Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.7685053Z 2025-12-04T10:13:48.7685272Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7686496Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7687513Z 2025-12-04T10:13:48.7687784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7688364Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.7688852Z ====================== 1 failed, 32 deselected in 10.11s ======================= 2025-12-04T10:13:48.7695239Z Got exit code 1 2025-12-04T10:13:48.7695564Z Retrying single test... 2025-12-04T10:13:48.7696380Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7f8857b703d650c5.xml 2025-12-04T10:13:48.7697308Z ============================= test session starts ============================== 2025-12-04T10:13:48.7698075Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.7698650Z cachedir: .pytest_cache 2025-12-04T10:13:48.7699344Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.7700109Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.7700448Z configfile: pytest.ini 2025-12-04T10:13:48.7701156Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.7702030Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.7703343Z stepcurrent: skipping 27 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7704540Z Running 1 items in this shard 2025-12-04T10:13:48.7704744Z 2025-12-04T10:13:48.7706095Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda I1204 10:09:32.589000 94598 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 94650 2025-12-04T10:13:48.7707915Z I1204 10:09:32.590000 94598 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 94651 2025-12-04T10:13:48.7708966Z I1204 10:09:32.591000 94598 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 94652 2025-12-04T10:13:48.7710007Z I1204 10:09:32.592000 94598 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 94653 2025-12-04T10:13:48.7712629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7714594Z _warn_cpu_init() 2025-12-04T10:13:48.7716503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7718497Z _warn_cpu_init() 2025-12-04T10:13:48.7720411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7722373Z _warn_cpu_init() 2025-12-04T10:13:48.7724547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7726620Z _warn_cpu_init() 2025-12-04T10:13:48.7727728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.7728881Z return func(*args, **kwargs) 2025-12-04T10:13:48.7729509Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7730556Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7732118Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7733918Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7736004Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7738872Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7741582Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7744447Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7747543Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7750356Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7753293Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7756218Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7759078Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7762014Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7766383Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.7770746Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7772582Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7775032Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7776876Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7778089Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7779691Z [rank2]:E1204 10:09:40.316000 94652 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.7780817Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7781930Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7783603Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7785234Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7786942Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7788477Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7789980Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7791752Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7793153Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7794547Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7795991Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7797350Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7798709Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7800110Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7802313Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 720306176 and is now 760152064. 2025-12-04T10:13:48.7804391Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7805419Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7807339Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7808961Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7810035Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7811256Z [rank0]:E1204 10:09:40.316000 94650 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.7812258Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7813314Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7815135Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7816757Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7818386Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7819926Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7821403Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7822978Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7824545Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7826325Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7827713Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7829071Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7830437Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7831834Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7834028Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.7836128Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7837147Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7839029Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7840639Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7841712Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7842949Z [rank3]:E1204 10:09:40.316000 94653 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.7843943Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7844941Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7846413Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7847860Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7849405Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7850753Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7852067Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7853543Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7855278Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7857605Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7859436Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7860973Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7862517Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7864104Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7866725Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.7868943Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7869959Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7871846Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7873473Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7874550Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7875816Z [rank1]:E1204 10:09:40.322000 94651 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.7876501Z dist init r=2, world=4 2025-12-04T10:13:48.7876755Z dist init r=3, world=4 2025-12-04T10:13:48.7876990Z dist init r=1, world=4 2025-12-04T10:13:48.7877216Z dist init r=0, world=4 2025-12-04T10:13:48.7878390Z [rank0]:[W1204 10:09:40.838235553 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.7880058Z FAILED [9.6271s] [100%] 2025-12-04T10:13:48.7880230Z 2025-12-04T10:13:48.7880392Z =================================== FAILURES =================================== 2025-12-04T10:13:48.7881169Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda _ 2025-12-04T10:13:48.7881925Z Traceback (most recent call last): 2025-12-04T10:13:48.7882698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.7883558Z self._join_processes(fn) 2025-12-04T10:13:48.7884332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.7885184Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.7886053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.7886898Z raise RuntimeError(error) 2025-12-04T10:13:48.7887327Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7887806Z Traceback (most recent call last): 2025-12-04T10:13:48.7888564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7889339Z getattr(self, test_name)() 2025-12-04T10:13:48.7890072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7890825Z fn() 2025-12-04T10:13:48.7891649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7892301Z method(*args, **kwargs) 2025-12-04T10:13:48.7892920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7893819Z method(*args, **kwargs) 2025-12-04T10:13:48.7894574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7895315Z with policy(): 2025-12-04T10:13:48.7895981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7896731Z raise RuntimeError(msg) 2025-12-04T10:13:48.7898330Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.7899871Z 2025-12-04T10:13:48.7900082Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7901318Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7902332Z 2025-12-04T10:13:48.7902597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7902993Z 2025-12-04T10:13:48.7902998Z 2025-12-04T10:13:48.7903283Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.7903891Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.7905088Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7f8857b703d650c5.xml - 2025-12-04T10:13:48.7906353Z =========================== short test summary info ============================ 2025-12-04T10:13:48.7907587Z FAILED [9.6271s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.7908723Z Traceback (most recent call last): 2025-12-04T10:13:48.7909412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7910107Z getattr(self, test_name)() 2025-12-04T10:13:48.7910762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7911921Z fn() 2025-12-04T10:13:48.7912477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7913135Z method(*args, **kwargs) 2025-12-04T10:13:48.7913740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7914392Z method(*args, **kwargs) 2025-12-04T10:13:48.7915004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7915649Z with policy(): 2025-12-04T10:13:48.7916238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7916906Z raise RuntimeError(msg) 2025-12-04T10:13:48.7918336Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.7919702Z 2025-12-04T10:13:48.7919893Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7920970Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7921874Z 2025-12-04T10:13:48.7922134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7922642Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.7923074Z ======================= 1 failed, 32 deselected in 9.84s ======================= 2025-12-04T10:13:48.7923431Z Got exit code 1 2025-12-04T10:13:48.7924284Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda 2025-12-04T10:13:48.7925464Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.7926494Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19e11b35947b0a14.xml 2025-12-04T10:13:48.7927293Z ============================= test session starts ============================== 2025-12-04T10:13:48.7927865Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.7928377Z cachedir: .pytest_cache 2025-12-04T10:13:48.7928981Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.7929685Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.7929987Z configfile: pytest.ini 2025-12-04T10:13:48.7930610Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.7931385Z collecting ... collected 60 items / 28 deselected / 32 selected 2025-12-04T10:13:48.7931808Z stepcurrent: skipping 28 already run items. 2025-12-04T10:13:48.7932163Z Running 5 items in this shard 2025-12-04T10:13:48.7932343Z 2025-12-04T10:13:48.7933500Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda I1204 10:09:47.070000 94935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 94987 2025-12-04T10:13:48.7935467Z I1204 10:09:47.071000 94935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 94988 2025-12-04T10:13:48.7936587Z I1204 10:09:47.071000 94935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 94989 2025-12-04T10:13:48.7937733Z I1204 10:09:47.072000 94935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 94990 2025-12-04T10:13:48.7940355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7942577Z _warn_cpu_init() 2025-12-04T10:13:48.7944723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7946933Z _warn_cpu_init() 2025-12-04T10:13:48.7948873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7950836Z _warn_cpu_init() 2025-12-04T10:13:48.7952735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.7954696Z _warn_cpu_init() 2025-12-04T10:13:48.7955708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.7956810Z return func(*args, **kwargs) 2025-12-04T10:13:48.7957404Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7958426Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7959892Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7961344Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7962819Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7964171Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7965494Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7966885Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7968311Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7969712Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7971114Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7972472Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7974100Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7975686Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7978171Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T10:13:48.7980686Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7981838Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.7983915Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.7985701Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.7986910Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.7988303Z [rank0]:E1204 10:09:54.941000 94987 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.7989491Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.7990031Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.7991202Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.7991692Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.7992567Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.7992922Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.7993800Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7994226Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7995083Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.7995510Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.7996359Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.7996749Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.7997604Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.7998033Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.7999703Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8000027Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8000606Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8001754Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8002074Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8002738Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8003223Z [rank2]:E1204 10:09:54.941000 94989 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8003626Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8004123Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8005011Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8005465Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8006336Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8006720Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8007563Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8007994Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8008849Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8009277Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8010130Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8010523Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8011405Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8011837Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8013508Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.8014028Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8014688Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8016015Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8016378Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8017102Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8017644Z [rank3]:E1204 10:09:54.942000 94990 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8018128Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8018656Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8019647Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8020185Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8021161Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8021564Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8022516Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8023001Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8023969Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8024454Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8025413Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8025988Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8026846Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8027276Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8028892Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.8029215Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8029792Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8030975Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8031295Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8031956Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8032436Z [rank1]:E1204 10:09:54.942000 94988 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8032534Z dist init r=2, world=4 2025-12-04T10:13:48.8032621Z dist init r=0, world=4 2025-12-04T10:13:48.8032703Z dist init r=3, world=4 2025-12-04T10:13:48.8032798Z dist init r=1, world=4 2025-12-04T10:13:48.8033811Z [rank0]:[W1204 10:09:55.457943297 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8033926Z FAILED [10.5168s] [ 20%] 2025-12-04T10:13:48.8033932Z 2025-12-04T10:13:48.8034066Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8034471Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda _ 2025-12-04T10:13:48.8034587Z Traceback (most recent call last): 2025-12-04T10:13:48.8035067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8035166Z self._join_processes(fn) 2025-12-04T10:13:48.8035688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8035814Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8036349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8036454Z raise RuntimeError(error) 2025-12-04T10:13:48.8036662Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.8036775Z Traceback (most recent call last): 2025-12-04T10:13:48.8037245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8037368Z getattr(self, test_name)() 2025-12-04T10:13:48.8037846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8037926Z fn() 2025-12-04T10:13:48.8038372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8038472Z method(*args, **kwargs) 2025-12-04T10:13:48.8038916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8039017Z method(*args, **kwargs) 2025-12-04T10:13:48.8039454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8039541Z with policy(): 2025-12-04T10:13:48.8039991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8040089Z raise RuntimeError(msg) 2025-12-04T10:13:48.8041325Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T10:13:48.8041331Z 2025-12-04T10:13:48.8041521Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8042642Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8042719Z 2025-12-04T10:13:48.8043156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8043165Z 2025-12-04T10:13:48.8043416Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.8043609Z Traceback (most recent call last): 2025-12-04T10:13:48.8044468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8044636Z getattr(self, test_name)() 2025-12-04T10:13:48.8045494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8045639Z fn() 2025-12-04T10:13:48.8046577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8046741Z method(*args, **kwargs) 2025-12-04T10:13:48.8047487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8047653Z method(*args, **kwargs) 2025-12-04T10:13:48.8048423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8048571Z with policy(): 2025-12-04T10:13:48.8049395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8049556Z raise RuntimeError(msg) 2025-12-04T10:13:48.8051966Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8051987Z 2025-12-04T10:13:48.8052347Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8054290Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8054310Z 2025-12-04T10:13:48.8054873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8054884Z 2025-12-04T10:13:48.8054891Z 2025-12-04T10:13:48.8055286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8055771Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8057218Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19e11b35947b0a14.xml - 2025-12-04T10:13:48.8057546Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8059461Z FAILED [10.5168s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.8059695Z Traceback (most recent call last): 2025-12-04T10:13:48.8060777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8060981Z getattr(self, test_name)() 2025-12-04T10:13:48.8062036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8062294Z fn() 2025-12-04T10:13:48.8063209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8063411Z method(*args, **kwargs) 2025-12-04T10:13:48.8064357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8064552Z method(*args, **kwargs) 2025-12-04T10:13:48.8065622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8065784Z with policy(): 2025-12-04T10:13:48.8066703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8066883Z raise RuntimeError(msg) 2025-12-04T10:13:48.8069244Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T10:13:48.8069357Z 2025-12-04T10:13:48.8069720Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8071152Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8071167Z 2025-12-04T10:13:48.8071605Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8071615Z 2025-12-04T10:13:48.8071867Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.8072059Z Traceback (most recent call last): 2025-12-04T10:13:48.8072980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8073151Z getattr(self, test_name)() 2025-12-04T10:13:48.8074051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8074201Z fn() 2025-12-04T10:13:48.8075078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8075240Z method(*args, **kwargs) 2025-12-04T10:13:48.8076050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8076225Z method(*args, **kwargs) 2025-12-04T10:13:48.8077144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8077308Z with policy(): 2025-12-04T10:13:48.8078066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8078221Z raise RuntimeError(msg) 2025-12-04T10:13:48.8079886Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8079906Z 2025-12-04T10:13:48.8080116Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8080959Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8080965Z 2025-12-04T10:13:48.8081236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8081410Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8081696Z ====================== 1 failed, 28 deselected in 10.73s ======================= 2025-12-04T10:13:48.8081790Z Got exit code 1 2025-12-04T10:13:48.8081897Z Retrying single test... 2025-12-04T10:13:48.8082529Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b5d6f54eb5c8ad3.xml 2025-12-04T10:13:48.8082690Z ============================= test session starts ============================== 2025-12-04T10:13:48.8083083Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8083193Z cachedir: .pytest_cache 2025-12-04T10:13:48.8083706Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8083832Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8083933Z configfile: pytest.ini 2025-12-04T10:13:48.8084471Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8084692Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.8085651Z stepcurrent: skipping 28 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8085767Z Running 1 items in this shard 2025-12-04T10:13:48.8085775Z 2025-12-04T10:13:48.8086975Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda I1204 10:10:02.019000 95272 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 95324 2025-12-04T10:13:48.8087466Z I1204 10:10:02.020000 95272 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 95325 2025-12-04T10:13:48.8087956Z I1204 10:10:02.021000 95272 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 95326 2025-12-04T10:13:48.8088443Z I1204 10:10:02.022000 95272 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 95327 2025-12-04T10:13:48.8090487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8090626Z _warn_cpu_init() 2025-12-04T10:13:48.8092727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8092816Z _warn_cpu_init() 2025-12-04T10:13:48.8094970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8095069Z _warn_cpu_init() 2025-12-04T10:13:48.8097084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8097190Z _warn_cpu_init() 2025-12-04T10:13:48.8098174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8098321Z return func(*args, **kwargs) 2025-12-04T10:13:48.8098779Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8099317Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8100316Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8100854Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8101844Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8102243Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8103209Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8103691Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8104657Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8105141Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8106311Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8106706Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8107555Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8107997Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8109603Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.8109928Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8110532Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8111670Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8112040Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8112668Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8113155Z [rank1]:E1204 10:10:09.948000 95325 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8113558Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8114031Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8114941Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8115386Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8116271Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8116620Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8117476Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8117907Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8118760Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8119216Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8120066Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8120463Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8121311Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8121747Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8123386Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T10:13:48.8123712Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8124295Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8125430Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8125785Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8126413Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8126898Z [rank0]:E1204 10:10:09.948000 95324 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8127325Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8127798Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8128677Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8129124Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8129998Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8130348Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8131198Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8131653Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8132502Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8132933Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8134041Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8134490Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8135450Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8135941Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8137781Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8138151Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8138835Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8140124Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8140485Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8141227Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8141776Z [rank3]:E1204 10:10:09.948000 95327 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8142222Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8142752Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8143751Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8144255Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8145246Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8145747Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8146747Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8147178Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8148030Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8148457Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8149298Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8149699Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8150637Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8151070Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8152680Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8153040Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8153619Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8154776Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8155121Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8155752Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8156243Z [rank2]:E1204 10:10:09.949000 95326 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8156329Z dist init r=1, world=4 2025-12-04T10:13:48.8156421Z dist init r=0, world=4 2025-12-04T10:13:48.8156505Z dist init r=2, world=4 2025-12-04T10:13:48.8156586Z dist init r=3, world=4 2025-12-04T10:13:48.8157623Z [rank0]:[W1204 10:10:10.487010659 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8157712Z FAILED [10.2140s] [100%] 2025-12-04T10:13:48.8157717Z 2025-12-04T10:13:48.8157845Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8158254Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda _ 2025-12-04T10:13:48.8158360Z Traceback (most recent call last): 2025-12-04T10:13:48.8158875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8158974Z self._join_processes(fn) 2025-12-04T10:13:48.8159490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8159619Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8160318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8160433Z raise RuntimeError(error) 2025-12-04T10:13:48.8160651Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8160801Z Traceback (most recent call last): 2025-12-04T10:13:48.8161681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8161863Z getattr(self, test_name)() 2025-12-04T10:13:48.8162695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8162854Z fn() 2025-12-04T10:13:48.8163459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8163567Z method(*args, **kwargs) 2025-12-04T10:13:48.8164036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8164135Z method(*args, **kwargs) 2025-12-04T10:13:48.8164610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8164741Z with policy(): 2025-12-04T10:13:48.8165215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8165323Z raise RuntimeError(msg) 2025-12-04T10:13:48.8166622Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.8166629Z 2025-12-04T10:13:48.8166840Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8167659Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8167665Z 2025-12-04T10:13:48.8167920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8167928Z 2025-12-04T10:13:48.8167932Z 2025-12-04T10:13:48.8168141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8168389Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8169157Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b5d6f54eb5c8ad3.xml - 2025-12-04T10:13:48.8169323Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8170274Z FAILED [10.2140s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8170388Z Traceback (most recent call last): 2025-12-04T10:13:48.8170901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8171014Z getattr(self, test_name)() 2025-12-04T10:13:48.8171544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8171637Z fn() 2025-12-04T10:13:48.8172108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8172208Z method(*args, **kwargs) 2025-12-04T10:13:48.8172689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8172788Z method(*args, **kwargs) 2025-12-04T10:13:48.8173352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8173452Z with policy(): 2025-12-04T10:13:48.8174113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8174231Z raise RuntimeError(msg) 2025-12-04T10:13:48.8175591Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.8175598Z 2025-12-04T10:13:48.8175849Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8176700Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8176708Z 2025-12-04T10:13:48.8176970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8177154Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8177361Z ====================== 1 failed, 32 deselected in 10.43s ======================= 2025-12-04T10:13:48.8177455Z Got exit code 1 2025-12-04T10:13:48.8177571Z Retrying single test... 2025-12-04T10:13:48.8178197Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b2eb1b61ddd90ac8.xml 2025-12-04T10:13:48.8178363Z ============================= test session starts ============================== 2025-12-04T10:13:48.8178988Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8179171Z cachedir: .pytest_cache 2025-12-04T10:13:48.8179696Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8179816Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8179922Z configfile: pytest.ini 2025-12-04T10:13:48.8180469Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8180685Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.8181617Z stepcurrent: skipping 28 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8181729Z Running 1 items in this shard 2025-12-04T10:13:48.8181735Z 2025-12-04T10:13:48.8182937Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda I1204 10:10:16.990000 95609 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 95661 2025-12-04T10:13:48.8183441Z I1204 10:10:16.990000 95609 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 95662 2025-12-04T10:13:48.8183934Z I1204 10:10:16.991000 95609 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 95663 2025-12-04T10:13:48.8184465Z I1204 10:10:16.992000 95609 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 95664 2025-12-04T10:13:48.8186479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8186592Z _warn_cpu_init() 2025-12-04T10:13:48.8188591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8188699Z _warn_cpu_init() 2025-12-04T10:13:48.8190929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8191025Z _warn_cpu_init() 2025-12-04T10:13:48.8191900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8192033Z return func(*args, **kwargs) 2025-12-04T10:13:48.8193813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8193923Z _warn_cpu_init() 2025-12-04T10:13:48.8194336Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8194804Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8195685Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8196139Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8197011Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8197373Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8198217Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8198684Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8199531Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8199959Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8200826Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8201214Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8202271Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8202727Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8204469Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 0. CUDA driver allocated memory was 714014720 and is now 737083392. 2025-12-04T10:13:48.8204811Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8205462Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8206678Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8207019Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8207727Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8208235Z [rank0]:E1204 10:10:24.841000 95661 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8208668Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8209163Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8210098Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8210579Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8211503Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8211885Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8212822Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8213373Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8214485Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8214968Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8215928Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8216374Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8217372Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8217858Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8219681Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8220072Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8220735Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8222019Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8222410Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8223133Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8223678Z [rank1]:E1204 10:10:24.841000 95662 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8224141Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8224666Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8225777Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8226350Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8227248Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8227611Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8228457Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8228899Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8229743Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8230181Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8231030Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8231447Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8232301Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8232734Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8234378Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.8234698Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8235285Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8236449Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8236768Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8237410Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8237892Z [rank2]:E1204 10:10:24.842000 95663 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8238300Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8238768Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8239658Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8240132Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8241004Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8241368Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8242215Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8242651Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8243495Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8243937Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8244804Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8245201Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8246064Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8246524Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8248148Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8248491Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8249083Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8250225Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8250540Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8251185Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8251668Z [rank3]:E1204 10:10:24.843000 95664 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8251765Z dist init r=1, world=4 2025-12-04T10:13:48.8251850Z dist init r=3, world=4 2025-12-04T10:13:48.8251932Z dist init r=2, world=4 2025-12-04T10:13:48.8252028Z dist init r=0, world=4 2025-12-04T10:13:48.8253074Z [rank0]:[W1204 10:10:25.463315995 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8253174Z FAILED [10.4413s] [100%] 2025-12-04T10:13:48.8253180Z 2025-12-04T10:13:48.8253362Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8253957Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda _ 2025-12-04T10:13:48.8254093Z Traceback (most recent call last): 2025-12-04T10:13:48.8254634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8254752Z self._join_processes(fn) 2025-12-04T10:13:48.8255337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8255477Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8256087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8256199Z raise RuntimeError(error) 2025-12-04T10:13:48.8256430Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8256557Z Traceback (most recent call last): 2025-12-04T10:13:48.8257128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8257253Z getattr(self, test_name)() 2025-12-04T10:13:48.8257781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8257869Z fn() 2025-12-04T10:13:48.8258409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8258514Z method(*args, **kwargs) 2025-12-04T10:13:48.8259016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8259130Z method(*args, **kwargs) 2025-12-04T10:13:48.8259625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8259732Z with policy(): 2025-12-04T10:13:48.8260234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8260396Z raise RuntimeError(msg) 2025-12-04T10:13:48.8261763Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8261772Z 2025-12-04T10:13:48.8261982Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8262837Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8262843Z 2025-12-04T10:13:48.8263106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8263111Z 2025-12-04T10:13:48.8263274Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.8263404Z Traceback (most recent call last): 2025-12-04T10:13:48.8263944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8264064Z getattr(self, test_name)() 2025-12-04T10:13:48.8264594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8264683Z fn() 2025-12-04T10:13:48.8265224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8265329Z method(*args, **kwargs) 2025-12-04T10:13:48.8266048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8266143Z method(*args, **kwargs) 2025-12-04T10:13:48.8266587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8266681Z with policy(): 2025-12-04T10:13:48.8267131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8267227Z raise RuntimeError(msg) 2025-12-04T10:13:48.8268446Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8268452Z 2025-12-04T10:13:48.8268641Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8269423Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8269430Z 2025-12-04T10:13:48.8269664Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8269669Z 2025-12-04T10:13:48.8269673Z 2025-12-04T10:13:48.8269877Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8270106Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8270845Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b2eb1b61ddd90ac8.xml - 2025-12-04T10:13:48.8271006Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8271896Z FAILED [10.4413s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8272018Z Traceback (most recent call last): 2025-12-04T10:13:48.8272530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8272629Z getattr(self, test_name)() 2025-12-04T10:13:48.8273113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8273193Z fn() 2025-12-04T10:13:48.8273646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8273740Z method(*args, **kwargs) 2025-12-04T10:13:48.8274179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8274276Z method(*args, **kwargs) 2025-12-04T10:13:48.8274716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8274803Z with policy(): 2025-12-04T10:13:48.8275264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8275359Z raise RuntimeError(msg) 2025-12-04T10:13:48.8276582Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8276589Z 2025-12-04T10:13:48.8276807Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8277550Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8277570Z 2025-12-04T10:13:48.8277801Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8277808Z 2025-12-04T10:13:48.8277951Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.8278069Z Traceback (most recent call last): 2025-12-04T10:13:48.8278550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8278802Z getattr(self, test_name)() 2025-12-04T10:13:48.8279499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8279591Z fn() 2025-12-04T10:13:48.8280168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8280269Z method(*args, **kwargs) 2025-12-04T10:13:48.8280835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8280946Z method(*args, **kwargs) 2025-12-04T10:13:48.8281449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8281543Z with policy(): 2025-12-04T10:13:48.8282057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8282208Z raise RuntimeError(msg) 2025-12-04T10:13:48.8283576Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8283583Z 2025-12-04T10:13:48.8283789Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8284633Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8284680Z 2025-12-04T10:13:48.8284940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8285116Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8285308Z ====================== 1 failed, 32 deselected in 10.66s ======================= 2025-12-04T10:13:48.8285403Z Got exit code 1 2025-12-04T10:13:48.8286161Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda 2025-12-04T10:13:48.8288866Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.8289485Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e6810d3a1c38013d.xml 2025-12-04T10:13:48.8289649Z ============================= test session starts ============================== 2025-12-04T10:13:48.8290008Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8290117Z cachedir: .pytest_cache 2025-12-04T10:13:48.8290636Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8290757Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8290861Z configfile: pytest.ini 2025-12-04T10:13:48.8291554Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8291774Z collecting ... collected 60 items / 29 deselected / 31 selected 2025-12-04T10:13:48.8291899Z stepcurrent: skipping 29 already run items. 2025-12-04T10:13:48.8292008Z Running 4 items in this shard 2025-12-04T10:13:48.8292014Z 2025-12-04T10:13:48.8293118Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 10:10:31.879000 95946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 95998 2025-12-04T10:13:48.8293825Z I1204 10:10:31.880000 95946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 95999 2025-12-04T10:13:48.8294321Z I1204 10:10:31.881000 95946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 96000 2025-12-04T10:13:48.8294805Z I1204 10:10:31.882000 95946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 96001 2025-12-04T10:13:48.8296869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8296967Z _warn_cpu_init() 2025-12-04T10:13:48.8298974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8299105Z _warn_cpu_init() 2025-12-04T10:13:48.8301099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8301197Z _warn_cpu_init() 2025-12-04T10:13:48.8303198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8303349Z _warn_cpu_init() 2025-12-04T10:13:48.8304352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8304462Z return func(*args, **kwargs) 2025-12-04T10:13:48.8304924Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8305466Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8306630Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8307085Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8307963Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8308309Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8309163Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8309599Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8310799Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8311237Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8312093Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8312522Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8313375Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8313820Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8315456Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T10:13:48.8315788Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8316373Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8317558Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8317919Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8318563Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8319044Z [rank0]:E1204 10:10:39.680000 95998 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8319442Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8319942Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8320827Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8321287Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8322157Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8322514Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8323365Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8323822Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8324674Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8325101Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8325985Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8326376Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8327237Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8327666Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8329307Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.8329634Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8330245Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8331433Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8331753Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8332394Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8332899Z [rank1]:E1204 10:10:39.681000 95999 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8333372Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8334050Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8335045Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8335559Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8336548Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8336957Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8337950Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8338435Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8339396Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8339907Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8340872Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8341319Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8342281Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8342763Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8344619Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 2. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8345022Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8345680Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8346989Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8347347Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8347985Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8348474Z [rank2]:E1204 10:10:39.681000 96000 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8348871Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8349344Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8350227Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8350684Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8351583Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8351942Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8352784Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8353241Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8354097Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8354529Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8355382Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8355771Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8356633Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8357062Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8358811Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 3. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.8359147Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8359731Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8360952Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8361275Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8361944Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8362425Z [rank3]:E1204 10:10:39.682000 96001 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8362514Z dist init r=0, world=4 2025-12-04T10:13:48.8362612Z dist init r=2, world=4 2025-12-04T10:13:48.8362696Z dist init r=1, world=4 2025-12-04T10:13:48.8362778Z dist init r=3, world=4 2025-12-04T10:13:48.8363942Z [rank0]:[W1204 10:10:40.190071807 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8364092Z FAILED [9.4751s] [ 25%] 2025-12-04T10:13:48.8364101Z 2025-12-04T10:13:48.8364369Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8365191Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.8365380Z Traceback (most recent call last): 2025-12-04T10:13:48.8366242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8366468Z self._join_processes(fn) 2025-12-04T10:13:48.8367402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8367642Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8368636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8368814Z raise RuntimeError(error) 2025-12-04T10:13:48.8369159Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8369346Z Traceback (most recent call last): 2025-12-04T10:13:48.8370156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8370326Z getattr(self, test_name)() 2025-12-04T10:13:48.8371194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8371336Z fn() 2025-12-04T10:13:48.8372133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8372317Z method(*args, **kwargs) 2025-12-04T10:13:48.8373372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8373681Z method(*args, **kwargs) 2025-12-04T10:13:48.8374780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8374961Z with policy(): 2025-12-04T10:13:48.8375975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8376160Z raise RuntimeError(msg) 2025-12-04T10:13:48.8379081Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.8379104Z 2025-12-04T10:13:48.8379480Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8381154Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8381166Z 2025-12-04T10:13:48.8381665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8381676Z 2025-12-04T10:13:48.8381684Z 2025-12-04T10:13:48.8382083Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8382573Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8384133Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e6810d3a1c38013d.xml - 2025-12-04T10:13:48.8384451Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8386597Z FAILED [9.4751s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8386826Z Traceback (most recent call last): 2025-12-04T10:13:48.8387860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8388066Z getattr(self, test_name)() 2025-12-04T10:13:48.8389018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8389307Z fn() 2025-12-04T10:13:48.8390238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8390423Z method(*args, **kwargs) 2025-12-04T10:13:48.8391355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8391518Z method(*args, **kwargs) 2025-12-04T10:13:48.8392375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8392531Z with policy(): 2025-12-04T10:13:48.8393370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8393563Z raise RuntimeError(msg) 2025-12-04T10:13:48.8395932Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.8395951Z 2025-12-04T10:13:48.8396332Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8397819Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8397978Z 2025-12-04T10:13:48.8398450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8398741Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8399015Z ======================= 1 failed, 29 deselected in 9.69s ======================= 2025-12-04T10:13:48.8399180Z Got exit code 1 2025-12-04T10:13:48.8399335Z Retrying single test... 2025-12-04T10:13:48.8400335Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-53dff883b0afb17e.xml 2025-12-04T10:13:48.8400612Z ============================= test session starts ============================== 2025-12-04T10:13:48.8401257Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8401419Z cachedir: .pytest_cache 2025-12-04T10:13:48.8402206Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8402324Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8402431Z configfile: pytest.ini 2025-12-04T10:13:48.8402939Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8403146Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.8404051Z stepcurrent: skipping 29 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8404158Z Running 1 items in this shard 2025-12-04T10:13:48.8404168Z 2025-12-04T10:13:48.8405392Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 10:10:46.370000 96283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 96335 2025-12-04T10:13:48.8405866Z I1204 10:10:46.371000 96283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 96336 2025-12-04T10:13:48.8406337Z I1204 10:10:46.371000 96283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 96337 2025-12-04T10:13:48.8406796Z I1204 10:10:46.372000 96283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 96338 2025-12-04T10:13:48.8408727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8408833Z _warn_cpu_init() 2025-12-04T10:13:48.8410708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8410812Z _warn_cpu_init() 2025-12-04T10:13:48.8412695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8412833Z _warn_cpu_init() 2025-12-04T10:13:48.8414050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8414175Z return func(*args, **kwargs) 2025-12-04T10:13:48.8416207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8416319Z _warn_cpu_init() 2025-12-04T10:13:48.8416777Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8417311Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8418319Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8418825Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8419815Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8420242Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8421211Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8421705Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8422691Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8423184Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8424140Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8424589Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8425659Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8426225Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8427877Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T10:13:48.8428252Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8428838Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8430044Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8430369Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8431003Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8431488Z [rank0]:E1204 10:10:54.404000 96335 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8431885Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8432350Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8433242Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8433692Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8434598Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8434950Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8435808Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8436265Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8437109Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8437547Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8438391Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8438791Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8439641Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8440114Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8441758Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.8442078Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8442696Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8443879Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8444214Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8444841Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8445333Z [rank1]:E1204 10:10:54.406000 96336 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8445732Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8446200Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8447127Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8447574Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8448457Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8448834Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8449691Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8450125Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8450969Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8451407Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8452267Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8452668Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8453796Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8454292Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8456142Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8456545Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8457215Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8458544Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8458916Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8459629Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8460185Z [rank2]:E1204 10:10:54.406000 96337 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8460635Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8461191Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8462202Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8462706Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8463731Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8464125Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8465097Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8465578Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8466687Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8467132Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8467979Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8468415Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8469264Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8469709Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8471381Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8471715Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8472297Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8473475Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8473808Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8474440Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8474959Z [rank3]:E1204 10:10:54.412000 96338 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8475049Z dist init r=0, world=4 2025-12-04T10:13:48.8475136Z dist init r=1, world=4 2025-12-04T10:13:48.8475228Z dist init r=2, world=4 2025-12-04T10:13:48.8475315Z dist init r=3, world=4 2025-12-04T10:13:48.8476337Z [rank0]:[W1204 10:10:54.913831913 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8476480Z FAILED [10.4176s] [100%] 2025-12-04T10:13:48.8476486Z 2025-12-04T10:13:48.8476614Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8477067Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.8477177Z Traceback (most recent call last): 2025-12-04T10:13:48.8477658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8477768Z self._join_processes(fn) 2025-12-04T10:13:48.8478284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8478420Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8479320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8479435Z raise RuntimeError(error) 2025-12-04T10:13:48.8479683Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.8479894Z Traceback (most recent call last): 2025-12-04T10:13:48.8480520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8480634Z getattr(self, test_name)() 2025-12-04T10:13:48.8481159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8481258Z fn() 2025-12-04T10:13:48.8481762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8481862Z method(*args, **kwargs) 2025-12-04T10:13:48.8482380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8482482Z method(*args, **kwargs) 2025-12-04T10:13:48.8483030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8483130Z with policy(): 2025-12-04T10:13:48.8483638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8483760Z raise RuntimeError(msg) 2025-12-04T10:13:48.8485160Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8485168Z 2025-12-04T10:13:48.8485397Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8486278Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8486285Z 2025-12-04T10:13:48.8486548Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8486556Z 2025-12-04T10:13:48.8486572Z 2025-12-04T10:13:48.8486834Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8487101Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8487911Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-53dff883b0afb17e.xml - 2025-12-04T10:13:48.8488079Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8489161Z FAILED [10.4176s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.8489292Z Traceback (most recent call last): 2025-12-04T10:13:48.8489838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8489965Z getattr(self, test_name)() 2025-12-04T10:13:48.8490502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8490587Z fn() 2025-12-04T10:13:48.8491212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8491317Z method(*args, **kwargs) 2025-12-04T10:13:48.8491885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8491987Z method(*args, **kwargs) 2025-12-04T10:13:48.8492435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8492529Z with policy(): 2025-12-04T10:13:48.8492978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8493104Z raise RuntimeError(msg) 2025-12-04T10:13:48.8494689Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8494697Z 2025-12-04T10:13:48.8494914Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8495818Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8495824Z 2025-12-04T10:13:48.8496122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8496309Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8496492Z ====================== 1 failed, 32 deselected in 10.63s ======================= 2025-12-04T10:13:48.8496590Z Got exit code 1 2025-12-04T10:13:48.8496706Z Retrying single test... 2025-12-04T10:13:48.8497331Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c92b48ae22d6a39.xml 2025-12-04T10:13:48.8497488Z ============================= test session starts ============================== 2025-12-04T10:13:48.8497844Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8497953Z cachedir: .pytest_cache 2025-12-04T10:13:48.8498479Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8498600Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8498712Z configfile: pytest.ini 2025-12-04T10:13:48.8499253Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8499496Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.8500468Z stepcurrent: skipping 29 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8500579Z Running 1 items in this shard 2025-12-04T10:13:48.8500585Z 2025-12-04T10:13:48.8501855Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 10:11:01.309000 96620 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 96672 2025-12-04T10:13:48.8502359Z I1204 10:11:01.310000 96620 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 96673 2025-12-04T10:13:48.8502849Z I1204 10:11:01.311000 96620 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 96674 2025-12-04T10:13:48.8503348Z I1204 10:11:01.312000 96620 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 96675 2025-12-04T10:13:48.8505372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8505481Z _warn_cpu_init() 2025-12-04T10:13:48.8507491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8507610Z _warn_cpu_init() 2025-12-04T10:13:48.8510101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8510263Z _warn_cpu_init() 2025-12-04T10:13:48.8512477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8512572Z _warn_cpu_init() 2025-12-04T10:13:48.8513516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8513620Z return func(*args, **kwargs) 2025-12-04T10:13:48.8514065Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8514568Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8515547Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8516031Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8516958Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8517376Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8518271Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8518742Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8519638Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8520091Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8520998Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8521453Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8522461Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8522892Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8524578Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.8524905Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8525485Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8526678Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8527005Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8527650Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8528129Z [rank1]:E1204 10:11:09.173000 96673 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8528581Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8529052Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8529928Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8530412Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8531288Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8531653Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8532495Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8532932Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8534069Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8534560Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8535574Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8536018Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8536987Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8537479Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8539369Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T10:13:48.8539735Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8540391Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8541738Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8542101Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8542864Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8543406Z [rank0]:E1204 10:11:09.173000 96672 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8543864Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8544389Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8545417Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8546127Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8547002Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8547365Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8548208Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8548654Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8549501Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8549958Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8550823Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8551217Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8552104Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8552536Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8554185Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8554504Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8555086Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8556304Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8556625Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8557265Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8557744Z [rank2]:E1204 10:11:09.173000 96674 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8558181Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8558649Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8559536Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8559993Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8560864Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8561225Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8562072Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8562538Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8563384Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8563810Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8564661Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8565077Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8565952Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8566381Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8568028Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8568355Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8568939Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8570153Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8570474Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8571144Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8571630Z [rank3]:E1204 10:11:09.175000 96675 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8571729Z dist init r=1, world=4 2025-12-04T10:13:48.8571823Z dist init r=0, world=4 2025-12-04T10:13:48.8571909Z dist init r=2, world=4 2025-12-04T10:13:48.8572004Z dist init r=3, world=4 2025-12-04T10:13:48.8573030Z [rank0]:[W1204 10:11:09.682670804 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8573122Z FAILED [9.5971s] [100%] 2025-12-04T10:13:48.8573136Z 2025-12-04T10:13:48.8573334Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8573968Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _ 2025-12-04T10:13:48.8574097Z Traceback (most recent call last): 2025-12-04T10:13:48.8574646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8574798Z self._join_processes(fn) 2025-12-04T10:13:48.8575390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8575529Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8576142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8576252Z raise RuntimeError(error) 2025-12-04T10:13:48.8576487Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.8576614Z Traceback (most recent call last): 2025-12-04T10:13:48.8577152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8577303Z getattr(self, test_name)() 2025-12-04T10:13:48.8577842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8577935Z fn() 2025-12-04T10:13:48.8578453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8578560Z method(*args, **kwargs) 2025-12-04T10:13:48.8579264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8579378Z method(*args, **kwargs) 2025-12-04T10:13:48.8579879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8579979Z with policy(): 2025-12-04T10:13:48.8580494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8580604Z raise RuntimeError(msg) 2025-12-04T10:13:48.8582083Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T10:13:48.8582093Z 2025-12-04T10:13:48.8582311Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8583203Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8583318Z 2025-12-04T10:13:48.8583585Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8583591Z 2025-12-04T10:13:48.8583596Z 2025-12-04T10:13:48.8583823Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8584094Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8584900Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c92b48ae22d6a39.xml - 2025-12-04T10:13:48.8585075Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8586118Z FAILED [9.5971s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.8586240Z Traceback (most recent call last): 2025-12-04T10:13:48.8586792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8586907Z getattr(self, test_name)() 2025-12-04T10:13:48.8587448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8587578Z fn() 2025-12-04T10:13:48.8588078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8588193Z method(*args, **kwargs) 2025-12-04T10:13:48.8588693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8588800Z method(*args, **kwargs) 2025-12-04T10:13:48.8589294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8589388Z with policy(): 2025-12-04T10:13:48.8589896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8590004Z raise RuntimeError(msg) 2025-12-04T10:13:48.8591605Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T10:13:48.8591622Z 2025-12-04T10:13:48.8591924Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8592708Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8592713Z 2025-12-04T10:13:48.8592952Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8593109Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8593269Z ======================= 1 failed, 32 deselected in 9.81s ======================= 2025-12-04T10:13:48.8593355Z Got exit code 1 2025-12-04T10:13:48.8594067Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8594465Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.8595016Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb06b260fb006313.xml 2025-12-04T10:13:48.8595164Z ============================= test session starts ============================== 2025-12-04T10:13:48.8595471Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8595591Z cachedir: .pytest_cache 2025-12-04T10:13:48.8596054Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8596163Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8596253Z configfile: pytest.ini 2025-12-04T10:13:48.8596733Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8596923Z collecting ... collected 60 items / 30 deselected / 30 selected 2025-12-04T10:13:48.8597053Z stepcurrent: skipping 30 already run items. 2025-12-04T10:13:48.8597149Z Running 3 items in this shard 2025-12-04T10:13:48.8597154Z 2025-12-04T10:13:48.8598058Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda I1204 10:11:15.839000 96957 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 97009 2025-12-04T10:13:48.8598505Z I1204 10:11:15.840000 96957 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 97010 2025-12-04T10:13:48.8598940Z I1204 10:11:15.841000 96957 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 97011 2025-12-04T10:13:48.8599373Z I1204 10:11:15.842000 96957 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 97012 2025-12-04T10:13:48.8600500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8600610Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8601706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8601814Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8602931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8603044Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8604130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8604239Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8606021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8606117Z _warn_cpu_init() 2025-12-04T10:13:48.8607908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8608004Z _warn_cpu_init() 2025-12-04T10:13:48.8609797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8609889Z _warn_cpu_init() 2025-12-04T10:13:48.8611647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8611738Z _warn_cpu_init() 2025-12-04T10:13:48.8612613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8612709Z return func(*args, **kwargs) 2025-12-04T10:13:48.8613154Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8613864Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8614866Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8615365Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8616389Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8616787Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8617746Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8618230Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8619180Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8619670Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8620624Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8621100Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8622058Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8622542Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8624203Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 718209024 and is now 760152064. 2025-12-04T10:13:48.8624568Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8625228Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8626477Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8626805Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8627438Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8627961Z [rank0]:E1204 10:11:26.995000 97009 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8628366Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8628827Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8629718Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8630190Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8631062Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8631421Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8632265Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8632700Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8633547Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8634032Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8634896Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8635285Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8636141Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8636600Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8638048Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.8638370Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8638957Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8639926Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8640247Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8640907Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8641387Z [rank2]:E1204 10:11:26.995000 97011 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8641791Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8642258Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8643176Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8643626Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8644500Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8644855Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8645705Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8646147Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8646995Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8647457Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8648303Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8648697Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8649575Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8650006Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8651459Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.8651779Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8652364Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8653381Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8653928Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8654640Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8655178Z [rank3]:E1204 10:11:26.996000 97012 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8655634Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8656189Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8657193Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8657704Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8658683Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8659080Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8660045Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8660534Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8661515Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8662003Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8662956Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8663428Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8664394Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8664882Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8666551Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.8666873Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8667469Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8668453Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8668770Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8669411Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8669892Z [rank1]:E1204 10:11:27.001000 97010 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8669988Z dist init r=0, world=4 2025-12-04T10:13:48.8670364Z dist init r=2, world=4 2025-12-04T10:13:48.8670452Z dist init r=1, world=4 2025-12-04T10:13:48.8670539Z dist init r=3, world=4 2025-12-04T10:13:48.8671562Z [rank0]:[W1204 10:11:27.512811826 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8671658Z FAILED [12.8411s] [ 33%] 2025-12-04T10:13:48.8671664Z 2025-12-04T10:13:48.8671796Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8672057Z ________ TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda ________ 2025-12-04T10:13:48.8672167Z Traceback (most recent call last): 2025-12-04T10:13:48.8672654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8672754Z self._join_processes(fn) 2025-12-04T10:13:48.8673264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8673390Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8673953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8674052Z raise RuntimeError(error) 2025-12-04T10:13:48.8674255Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.8674363Z Traceback (most recent call last): 2025-12-04T10:13:48.8674840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8674981Z getattr(self, test_name)() 2025-12-04T10:13:48.8675445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8675521Z fn() 2025-12-04T10:13:48.8675971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8676064Z method(*args, **kwargs) 2025-12-04T10:13:48.8676508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8676603Z method(*args, **kwargs) 2025-12-04T10:13:48.8677045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8677133Z with policy(): 2025-12-04T10:13:48.8677577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8677675Z raise RuntimeError(msg) 2025-12-04T10:13:48.8678883Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.8678926Z 2025-12-04T10:13:48.8679297Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8679940Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8679946Z 2025-12-04T10:13:48.8684810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8684824Z 2025-12-04T10:13:48.8684830Z 2025-12-04T10:13:48.8685080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8685353Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8686172Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb06b260fb006313.xml - 2025-12-04T10:13:48.8686444Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8687262Z FAILED [12.8411s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.8687380Z Traceback (most recent call last): 2025-12-04T10:13:48.8687937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8688047Z getattr(self, test_name)() 2025-12-04T10:13:48.8688586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8688675Z fn() 2025-12-04T10:13:48.8689181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8689287Z method(*args, **kwargs) 2025-12-04T10:13:48.8689790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8689890Z method(*args, **kwargs) 2025-12-04T10:13:48.8690435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8690530Z with policy(): 2025-12-04T10:13:48.8691151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8691253Z raise RuntimeError(msg) 2025-12-04T10:13:48.8692417Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.8692463Z 2025-12-04T10:13:48.8692659Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8693309Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8693318Z 2025-12-04T10:13:48.8693557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8693893Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8694068Z ====================== 1 failed, 30 deselected in 13.06s ======================= 2025-12-04T10:13:48.8694166Z Got exit code 1 2025-12-04T10:13:48.8694302Z Retrying single test... 2025-12-04T10:13:48.8695330Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29ffb7b96244526a.xml 2025-12-04T10:13:48.8695627Z ============================= test session starts ============================== 2025-12-04T10:13:48.8696257Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8696434Z cachedir: .pytest_cache 2025-12-04T10:13:48.8697462Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8697668Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8697849Z configfile: pytest.ini 2025-12-04T10:13:48.8698828Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8699220Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.8700492Z stepcurrent: skipping 30 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8700696Z Running 1 items in this shard 2025-12-04T10:13:48.8700708Z 2025-12-04T10:13:48.8702591Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda I1204 10:11:33.729000 97294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 97346 2025-12-04T10:13:48.8703495Z I1204 10:11:33.730000 97294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 97347 2025-12-04T10:13:48.8704372Z I1204 10:11:33.731000 97294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 97348 2025-12-04T10:13:48.8705276Z I1204 10:11:33.732000 97294 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 97349 2025-12-04T10:13:48.8707600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8707840Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8710055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8710394Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8712692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8712919Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8715272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8715556Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8719157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8719351Z _warn_cpu_init() 2025-12-04T10:13:48.8723013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8723195Z _warn_cpu_init() 2025-12-04T10:13:48.8726878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8727058Z _warn_cpu_init() 2025-12-04T10:13:48.8730734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8730901Z _warn_cpu_init() 2025-12-04T10:13:48.8732209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8732329Z return func(*args, **kwargs) 2025-12-04T10:13:48.8732775Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8733399Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8734567Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8735077Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8736137Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8736532Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8737489Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8738010Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8738971Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8739458Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8740413Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8740851Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8741814Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8742300Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8743958Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.8744325Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8744978Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8746204Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8746667Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8747299Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8747780Z [rank1]:E1204 10:11:44.735000 97347 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8748174Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8748648Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8749529Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8750020Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8750894Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8751240Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8752113Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8752540Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8753391Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8753816Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8754658Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8755057Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8755905Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8756371Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8757813Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 709820416 and is now 760152064. 2025-12-04T10:13:48.8758139Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8758743Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8759709Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8760025Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8760659Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8761143Z [rank0]:E1204 10:11:44.735000 97346 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8761543Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8762013Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8762918Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8763366Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8764240Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8764617Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8765473Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8765903Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8766748Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8767173Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8768015Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8768412Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8769300Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8769734Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8771199Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 485425152 and is now 651100160. 2025-12-04T10:13:48.8771529Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8772109Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8773074Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8773451Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8774319Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8774869Z [rank2]:E1204 10:11:44.735000 97348 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8775315Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8775888Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8776886Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8777387Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8778410Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8779020Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8779985Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8780467Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8781424Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8781909Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8782859Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8783378Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8784328Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8784820Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8786491Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.8786863Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8787511Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8788603Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8788963Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8789677Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8790222Z [rank3]:E1204 10:11:44.737000 97349 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8790472Z dist init r=2, world=4 2025-12-04T10:13:48.8790565Z dist init r=1, world=4 2025-12-04T10:13:48.8790660Z dist init r=3, world=4 2025-12-04T10:13:48.8790855Z dist init r=0, world=4 2025-12-04T10:13:48.8791885Z [rank0]:[W1204 10:11:45.262473304 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8792008Z FAILED [13.3143s] [100%] 2025-12-04T10:13:48.8792015Z 2025-12-04T10:13:48.8792142Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8792417Z ________ TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda ________ 2025-12-04T10:13:48.8792520Z Traceback (most recent call last): 2025-12-04T10:13:48.8793008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8793106Z self._join_processes(fn) 2025-12-04T10:13:48.8793615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8793744Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8794273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8794372Z raise RuntimeError(error) 2025-12-04T10:13:48.8794583Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8794686Z Traceback (most recent call last): 2025-12-04T10:13:48.8795168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8795293Z getattr(self, test_name)() 2025-12-04T10:13:48.8795763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8795842Z fn() 2025-12-04T10:13:48.8796282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8796372Z method(*args, **kwargs) 2025-12-04T10:13:48.8796820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8796910Z method(*args, **kwargs) 2025-12-04T10:13:48.8797351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8797437Z with policy(): 2025-12-04T10:13:48.8797918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8798018Z raise RuntimeError(msg) 2025-12-04T10:13:48.8799058Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.8799065Z 2025-12-04T10:13:48.8799257Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8799816Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8799823Z 2025-12-04T10:13:48.8800054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8800065Z 2025-12-04T10:13:48.8800069Z 2025-12-04T10:13:48.8800263Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8800493Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8801233Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29ffb7b96244526a.xml - 2025-12-04T10:13:48.8801380Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8802101Z FAILED [13.3143s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8802206Z Traceback (most recent call last): 2025-12-04T10:13:48.8802719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8802822Z getattr(self, test_name)() 2025-12-04T10:13:48.8803297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8803373Z fn() 2025-12-04T10:13:48.8803828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8803918Z method(*args, **kwargs) 2025-12-04T10:13:48.8804367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8804457Z method(*args, **kwargs) 2025-12-04T10:13:48.8804895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8804986Z with policy(): 2025-12-04T10:13:48.8805430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8805523Z raise RuntimeError(msg) 2025-12-04T10:13:48.8806568Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T10:13:48.8806601Z 2025-12-04T10:13:48.8806792Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8807361Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8807366Z 2025-12-04T10:13:48.8807598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8807761Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8807917Z ====================== 1 failed, 32 deselected in 13.53s ======================= 2025-12-04T10:13:48.8808000Z Got exit code 1 2025-12-04T10:13:48.8808121Z Retrying single test... 2025-12-04T10:13:48.8808670Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-64a83fa5a2cd03db.xml 2025-12-04T10:13:48.8808810Z ============================= test session starts ============================== 2025-12-04T10:13:48.8809123Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8809214Z cachedir: .pytest_cache 2025-12-04T10:13:48.8809668Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8809770Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8809859Z configfile: pytest.ini 2025-12-04T10:13:48.8810342Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8810529Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.8811157Z stepcurrent: skipping 30 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8811259Z Running 1 items in this shard 2025-12-04T10:13:48.8811264Z 2025-12-04T10:13:48.8812240Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda I1204 10:11:51.750000 97631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 97683 2025-12-04T10:13:48.8812681Z I1204 10:11:51.750000 97631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 97684 2025-12-04T10:13:48.8813116Z I1204 10:11:51.751000 97631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 97685 2025-12-04T10:13:48.8813867Z I1204 10:11:51.752000 97631 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 97686 2025-12-04T10:13:48.8815108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8815236Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8816466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8816587Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8817817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8817941Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8819159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8819326Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8821320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8821422Z _warn_cpu_init() 2025-12-04T10:13:48.8823451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8823552Z _warn_cpu_init() 2025-12-04T10:13:48.8825648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8825748Z _warn_cpu_init() 2025-12-04T10:13:48.8827658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8827748Z _warn_cpu_init() 2025-12-04T10:13:48.8828621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8828741Z return func(*args, **kwargs) 2025-12-04T10:13:48.8829151Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8829620Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8830514Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8830958Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8831834Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8832180Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8833023Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8833483Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8834329Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8834759Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8835625Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8836017Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8836883Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8837309Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8838755Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T10:13:48.8839075Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8839682Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8840643Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8840966Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8841620Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8842100Z [rank0]:E1204 10:12:02.615000 97683 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8842502Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8842967Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8843846Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8844287Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8845159Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8845515Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8846397Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8846943Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8848416Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8849205Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8850174Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8850595Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8851508Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8851966Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8853769Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T10:13:48.8854176Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8854836Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8855913Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8856312Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8857025Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8857568Z [rank1]:E1204 10:12:02.617000 97684 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8858023Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8858547Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8859551Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8860055Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8861041Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8861475Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8862425Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8862914Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8863893Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8864382Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8865346Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8865886Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8866853Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8867287Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8868764Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.8869085Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8869669Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8870626Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8870969Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8871609Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8872086Z [rank3]:E1204 10:12:02.617000 97686 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8872488Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8872951Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8873837Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8874279Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8875174Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8875526Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8876367Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8876799Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8877662Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8878094Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8879254Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8879699Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8880664Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8881150Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8882848Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T10:13:48.8883205Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8883864Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8884993Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8885350Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8886068Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8886611Z [rank2]:E1204 10:12:02.617000 97685 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8886715Z dist init r=3, world=4 2025-12-04T10:13:48.8886810Z dist init r=0, world=4 2025-12-04T10:13:48.8886905Z dist init r=1, world=4 2025-12-04T10:13:48.8887004Z dist init r=2, world=4 2025-12-04T10:13:48.8888162Z [rank0]:[W1204 10:12:03.133724273 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8888303Z FAILED [12.8218s] [100%] 2025-12-04T10:13:48.8888311Z 2025-12-04T10:13:48.8888454Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8888751Z ________ TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda ________ 2025-12-04T10:13:48.8888875Z Traceback (most recent call last): 2025-12-04T10:13:48.8889415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8889525Z self._join_processes(fn) 2025-12-04T10:13:48.8890111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8890248Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8890890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8891002Z raise RuntimeError(error) 2025-12-04T10:13:48.8891231Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.8891459Z Traceback (most recent call last): 2025-12-04T10:13:48.8891931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8892030Z getattr(self, test_name)() 2025-12-04T10:13:48.8892495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8892571Z fn() 2025-12-04T10:13:48.8893018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8893108Z method(*args, **kwargs) 2025-12-04T10:13:48.8893777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8893888Z method(*args, **kwargs) 2025-12-04T10:13:48.8894419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8894518Z with policy(): 2025-12-04T10:13:48.8895017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8895118Z raise RuntimeError(msg) 2025-12-04T10:13:48.8896297Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T10:13:48.8896348Z 2025-12-04T10:13:48.8896561Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8897200Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8897208Z 2025-12-04T10:13:48.8897465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8897473Z 2025-12-04T10:13:48.8897633Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.8897754Z Traceback (most recent call last): 2025-12-04T10:13:48.8898298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8898408Z getattr(self, test_name)() 2025-12-04T10:13:48.8898936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8899023Z fn() 2025-12-04T10:13:48.8899534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8899635Z method(*args, **kwargs) 2025-12-04T10:13:48.8900134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8900269Z method(*args, **kwargs) 2025-12-04T10:13:48.8900763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8900859Z with policy(): 2025-12-04T10:13:48.8901359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8901465Z raise RuntimeError(msg) 2025-12-04T10:13:48.8902648Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.8902684Z 2025-12-04T10:13:48.8902902Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8903547Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8903553Z 2025-12-04T10:13:48.8903814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8903820Z 2025-12-04T10:13:48.8903824Z 2025-12-04T10:13:48.8904048Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.8904305Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.8905101Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-64a83fa5a2cd03db.xml - 2025-12-04T10:13:48.8905275Z =========================== short test summary info ============================ 2025-12-04T10:13:48.8906161Z FAILED [12.8218s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.8906273Z Traceback (most recent call last): 2025-12-04T10:13:48.8906778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8906873Z getattr(self, test_name)() 2025-12-04T10:13:48.8907351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8907428Z fn() 2025-12-04T10:13:48.8907872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8907994Z method(*args, **kwargs) 2025-12-04T10:13:48.8908437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8908531Z method(*args, **kwargs) 2025-12-04T10:13:48.8908970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8909054Z with policy(): 2025-12-04T10:13:48.8909509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8909600Z raise RuntimeError(msg) 2025-12-04T10:13:48.8910638Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T10:13:48.8910651Z 2025-12-04T10:13:48.8910838Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8911403Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8911408Z 2025-12-04T10:13:48.8911673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8911678Z 2025-12-04T10:13:48.8911815Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.8911926Z Traceback (most recent call last): 2025-12-04T10:13:48.8912405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8912500Z getattr(self, test_name)() 2025-12-04T10:13:48.8912978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8913056Z fn() 2025-12-04T10:13:48.8913499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8913594Z method(*args, **kwargs) 2025-12-04T10:13:48.8914059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8914159Z method(*args, **kwargs) 2025-12-04T10:13:48.8914597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8914682Z with policy(): 2025-12-04T10:13:48.8915133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8915227Z raise RuntimeError(msg) 2025-12-04T10:13:48.8916268Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T10:13:48.8916282Z 2025-12-04T10:13:48.8916467Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8917029Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8917036Z 2025-12-04T10:13:48.8917294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8917448Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.8917607Z ====================== 1 failed, 32 deselected in 13.04s ======================= 2025-12-04T10:13:48.8917689Z Got exit code 1 2025-12-04T10:13:48.8918178Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda 2025-12-04T10:13:48.8918565Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.8919113Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1d1edf2996f09e22.xml 2025-12-04T10:13:48.8919251Z ============================= test session starts ============================== 2025-12-04T10:13:48.8919562Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.8919656Z cachedir: .pytest_cache 2025-12-04T10:13:48.8920115Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.8920218Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.8920308Z configfile: pytest.ini 2025-12-04T10:13:48.8920781Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.8920969Z collecting ... collected 60 items / 31 deselected / 29 selected 2025-12-04T10:13:48.8921098Z stepcurrent: skipping 31 already run items. 2025-12-04T10:13:48.8921194Z Running 2 items in this shard 2025-12-04T10:13:48.8921200Z 2025-12-04T10:13:48.8922120Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 10:12:09.339000 97968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 98020 2025-12-04T10:13:48.8922593Z I1204 10:12:09.340000 97968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 98021 2025-12-04T10:13:48.8923025Z I1204 10:12:09.341000 97968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 98022 2025-12-04T10:13:48.8923460Z I1204 10:12:09.342000 97968 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 98023 2025-12-04T10:13:48.8924553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8924687Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8925778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8925888Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8926971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8927079Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8928163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.8928273Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.8930072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8930164Z _warn_cpu_init() 2025-12-04T10:13:48.8931935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8932054Z _warn_cpu_init() 2025-12-04T10:13:48.8934088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8934194Z _warn_cpu_init() 2025-12-04T10:13:48.8936186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.8936320Z _warn_cpu_init() 2025-12-04T10:13:48.8937302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.8937407Z return func(*args, **kwargs) 2025-12-04T10:13:48.8937869Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8938400Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8939431Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8939941Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8940921Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8941320Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8942283Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8942772Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8943769Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8944260Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8945211Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8945680Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8946663Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8947096Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8948565Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 720306176 and is now 737083392. 2025-12-04T10:13:48.8948887Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8949473Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8950470Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8950824Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8951451Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8951935Z [rank0]:E1204 10:12:21.262000 98020 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.8952340Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8952834Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8953721Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8954165Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8955041Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8955396Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8956238Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8956701Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8957545Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8957976Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8958843Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8959234Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8960089Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8960515Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8961986Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.8962311Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8962894Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8963912Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8964238Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8964871Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8965383Z [rank2]:E1204 10:12:21.262000 98022 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.8965786Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8966254Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8967148Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8967594Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8968463Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8968820Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8969690Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8970126Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8970967Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8971425Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8972269Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8972665Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8973746Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8974232Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8975892Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.8976297Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8976954Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8978077Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8978444Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8979403Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8979943Z [rank1]:E1204 10:12:21.262000 98021 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.8980401Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.8980925Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.8981927Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8982436Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.8983419Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.8983861Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.8984825Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8985314Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8986298Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.8986785Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.8987741Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.8988179Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.8989142Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.8989626Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.8991330Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.8991684Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8992270Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.8993262Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.8993616Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.8994255Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.8994734Z [rank3]:E1204 10:12:21.263000 98023 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.8994827Z dist init r=0, world=4 2025-12-04T10:13:48.8994909Z dist init r=1, world=4 2025-12-04T10:13:48.8994990Z dist init r=3, world=4 2025-12-04T10:13:48.8995076Z dist init r=2, world=4 2025-12-04T10:13:48.8996092Z [rank0]:[W1204 10:12:21.790643816 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.8996187Z FAILED [14.5618s] [ 50%] 2025-12-04T10:13:48.8996195Z 2025-12-04T10:13:48.8996317Z =================================== FAILURES =================================== 2025-12-04T10:13:48.8996591Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____ 2025-12-04T10:13:48.8996707Z Traceback (most recent call last): 2025-12-04T10:13:48.8997217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.8997324Z self._join_processes(fn) 2025-12-04T10:13:48.8997836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.8997961Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.8998525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.8998621Z raise RuntimeError(error) 2025-12-04T10:13:48.8998827Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.8998938Z Traceback (most recent call last): 2025-12-04T10:13:48.8999415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.8999520Z getattr(self, test_name)() 2025-12-04T10:13:48.8999984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9000058Z fn() 2025-12-04T10:13:48.9000508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9000600Z method(*args, **kwargs) 2025-12-04T10:13:48.9001048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9001143Z method(*args, **kwargs) 2025-12-04T10:13:48.9001587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9001704Z with policy(): 2025-12-04T10:13:48.9002148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9002242Z raise RuntimeError(msg) 2025-12-04T10:13:48.9003310Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.9003316Z 2025-12-04T10:13:48.9003504Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9004111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9004116Z 2025-12-04T10:13:48.9004372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9004380Z 2025-12-04T10:13:48.9004523Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.9004632Z Traceback (most recent call last): 2025-12-04T10:13:48.9005111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9005212Z getattr(self, test_name)() 2025-12-04T10:13:48.9005683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9005757Z fn() 2025-12-04T10:13:48.9006206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9006298Z method(*args, **kwargs) 2025-12-04T10:13:48.9006740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9006834Z method(*args, **kwargs) 2025-12-04T10:13:48.9007275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9007367Z with policy(): 2025-12-04T10:13:48.9007837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9007932Z raise RuntimeError(msg) 2025-12-04T10:13:48.9009000Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.9009031Z 2025-12-04T10:13:48.9009216Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9009821Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9009828Z 2025-12-04T10:13:48.9010055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9010060Z 2025-12-04T10:13:48.9010065Z 2025-12-04T10:13:48.9010265Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.9010492Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.9011191Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1d1edf2996f09e22.xml - 2025-12-04T10:13:48.9011343Z =========================== short test summary info ============================ 2025-12-04T10:13:48.9012086Z FAILED [14.5618s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.9012196Z Traceback (most recent call last): 2025-12-04T10:13:48.9012679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9012802Z getattr(self, test_name)() 2025-12-04T10:13:48.9013344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9013421Z fn() 2025-12-04T10:13:48.9014069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9014175Z method(*args, **kwargs) 2025-12-04T10:13:48.9014673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9014778Z method(*args, **kwargs) 2025-12-04T10:13:48.9015380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9015475Z with policy(): 2025-12-04T10:13:48.9015986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9016090Z raise RuntimeError(msg) 2025-12-04T10:13:48.9017300Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.9017306Z 2025-12-04T10:13:48.9017515Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9018189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9018194Z 2025-12-04T10:13:48.9018459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9018464Z 2025-12-04T10:13:48.9018620Z Process 3 exited with error code 10 and exception: 2025-12-04T10:13:48.9018745Z Traceback (most recent call last): 2025-12-04T10:13:48.9019314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9019418Z getattr(self, test_name)() 2025-12-04T10:13:48.9019955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9020038Z fn() 2025-12-04T10:13:48.9020534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9020671Z method(*args, **kwargs) 2025-12-04T10:13:48.9021168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9021274Z method(*args, **kwargs) 2025-12-04T10:13:48.9021773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9021870Z with policy(): 2025-12-04T10:13:48.9022379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9022482Z raise RuntimeError(msg) 2025-12-04T10:13:48.9023689Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.9023698Z 2025-12-04T10:13:48.9023906Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9024578Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9024584Z 2025-12-04T10:13:48.9024851Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9025364Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.9025658Z ====================== 1 failed, 31 deselected in 14.78s ======================= 2025-12-04T10:13:48.9025749Z Got exit code 1 2025-12-04T10:13:48.9025846Z Retrying single test... 2025-12-04T10:13:48.9026517Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d210f9114da1dd7.xml 2025-12-04T10:13:48.9026655Z ============================= test session starts ============================== 2025-12-04T10:13:48.9026957Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.9027062Z cachedir: .pytest_cache 2025-12-04T10:13:48.9027544Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.9027656Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.9027744Z configfile: pytest.ini 2025-12-04T10:13:48.9028215Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.9028408Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.9029080Z stepcurrent: skipping 31 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9029183Z Running 1 items in this shard 2025-12-04T10:13:48.9029188Z 2025-12-04T10:13:48.9030108Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 10:12:28.329000 98305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 98357 2025-12-04T10:13:48.9030549Z I1204 10:12:28.330000 98305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 98358 2025-12-04T10:13:48.9030993Z I1204 10:12:28.331000 98305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 98359 2025-12-04T10:13:48.9031453Z I1204 10:12:28.332000 98305 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 98360 2025-12-04T10:13:48.9032555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9032691Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9033784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9033900Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9035802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9036016Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9038007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9038222Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9041378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9041685Z _warn_cpu_init() 2025-12-04T10:13:48.9045408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9045621Z _warn_cpu_init() 2025-12-04T10:13:48.9049330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9049526Z _warn_cpu_init() 2025-12-04T10:13:48.9053431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9053793Z _warn_cpu_init() 2025-12-04T10:13:48.9055645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.9055940Z return func(*args, **kwargs) 2025-12-04T10:13:48.9056795Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9057748Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9059567Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9060649Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9062541Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9063305Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9065231Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9066146Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9067832Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9068696Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9070299Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9070890Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9071974Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9072517Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9074071Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T10:13:48.9074424Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9075038Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9076108Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9076445Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9077156Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9077671Z [rank0]:E1204 10:12:40.257000 98357 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.9078091Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9078804Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9079965Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9080479Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9081462Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9081860Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9082809Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9083296Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9084337Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9084818Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9085776Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9086217Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9087217Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9087704Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9089350Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.9089716Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9090368Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9091587Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9091940Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9092574Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9093054Z [rank2]:E1204 10:12:40.258000 98359 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.9093554Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9094248Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9095243Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9095754Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9096734Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9097134Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9098090Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9098608Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9099564Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9100043Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9100995Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9101464Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9102431Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9102915Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9104556Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.9104921Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9105679Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9106843Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9107158Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9107796Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9108323Z [rank1]:E1204 10:12:40.259000 98358 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.9108717Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9109187Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9110068Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9110519Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9111387Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9111746Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9112624Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9113052Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9113905Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9114335Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9115208Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9115603Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9116457Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9116886Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9118351Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.9118679Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9119283Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9120285Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9120598Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9121263Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9121740Z [rank3]:E1204 10:12:40.259000 98360 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.9121832Z dist init r=0, world=4 2025-12-04T10:13:48.9121921Z dist init r=2, world=4 2025-12-04T10:13:48.9122007Z dist init r=3, world=4 2025-12-04T10:13:48.9122091Z dist init r=1, world=4 2025-12-04T10:13:48.9123116Z [rank0]:[W1204 10:12:40.774090167 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.9123204Z FAILED [14.5405s] [100%] 2025-12-04T10:13:48.9123213Z 2025-12-04T10:13:48.9123347Z =================================== FAILURES =================================== 2025-12-04T10:13:48.9123621Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____ 2025-12-04T10:13:48.9123731Z Traceback (most recent call last): 2025-12-04T10:13:48.9124220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.9124343Z self._join_processes(fn) 2025-12-04T10:13:48.9124866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.9124992Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.9125521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.9125623Z raise RuntimeError(error) 2025-12-04T10:13:48.9125828Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.9125930Z Traceback (most recent call last): 2025-12-04T10:13:48.9126434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9126530Z getattr(self, test_name)() 2025-12-04T10:13:48.9127010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9127087Z fn() 2025-12-04T10:13:48.9127535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9127636Z method(*args, **kwargs) 2025-12-04T10:13:48.9128079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9128170Z method(*args, **kwargs) 2025-12-04T10:13:48.9128619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9128705Z with policy(): 2025-12-04T10:13:48.9129161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9129255Z raise RuntimeError(msg) 2025-12-04T10:13:48.9130350Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T10:13:48.9130363Z 2025-12-04T10:13:48.9130550Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9131145Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9131178Z 2025-12-04T10:13:48.9131413Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9131418Z 2025-12-04T10:13:48.9131560Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.9131669Z Traceback (most recent call last): 2025-12-04T10:13:48.9132144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9132240Z getattr(self, test_name)() 2025-12-04T10:13:48.9132711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9132786Z fn() 2025-12-04T10:13:48.9133281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9133383Z method(*args, **kwargs) 2025-12-04T10:13:48.9134025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9134137Z method(*args, **kwargs) 2025-12-04T10:13:48.9134632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9134726Z with policy(): 2025-12-04T10:13:48.9135237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9135378Z raise RuntimeError(msg) 2025-12-04T10:13:48.9136579Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.9136594Z 2025-12-04T10:13:48.9136808Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9137474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9137482Z 2025-12-04T10:13:48.9137745Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9137750Z 2025-12-04T10:13:48.9137784Z 2025-12-04T10:13:48.9138002Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.9138270Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.9139064Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d210f9114da1dd7.xml - 2025-12-04T10:13:48.9139228Z =========================== short test summary info ============================ 2025-12-04T10:13:48.9140071Z FAILED [14.5405s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.9140190Z Traceback (most recent call last): 2025-12-04T10:13:48.9140743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9140852Z getattr(self, test_name)() 2025-12-04T10:13:48.9141384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9141479Z fn() 2025-12-04T10:13:48.9142009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9142113Z method(*args, **kwargs) 2025-12-04T10:13:48.9142616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9142715Z method(*args, **kwargs) 2025-12-04T10:13:48.9143216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9143340Z with policy(): 2025-12-04T10:13:48.9143843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9143956Z raise RuntimeError(msg) 2025-12-04T10:13:48.9145168Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T10:13:48.9145176Z 2025-12-04T10:13:48.9145394Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9146252Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9146257Z 2025-12-04T10:13:48.9146487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9146500Z 2025-12-04T10:13:48.9146642Z Process 2 exited with error code 10 and exception: 2025-12-04T10:13:48.9146748Z Traceback (most recent call last): 2025-12-04T10:13:48.9147241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9147364Z getattr(self, test_name)() 2025-12-04T10:13:48.9147840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9147921Z fn() 2025-12-04T10:13:48.9148363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9148463Z method(*args, **kwargs) 2025-12-04T10:13:48.9148901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9148992Z method(*args, **kwargs) 2025-12-04T10:13:48.9149437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9149520Z with policy(): 2025-12-04T10:13:48.9149992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9150097Z raise RuntimeError(msg) 2025-12-04T10:13:48.9151166Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T10:13:48.9151172Z 2025-12-04T10:13:48.9151362Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9151955Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9151962Z 2025-12-04T10:13:48.9152196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9152354Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.9152508Z ====================== 1 failed, 32 deselected in 14.76s ======================= 2025-12-04T10:13:48.9152600Z Got exit code 1 2025-12-04T10:13:48.9152690Z Retrying single test... 2025-12-04T10:13:48.9153280Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ffec29f281535337.xml 2025-12-04T10:13:48.9153424Z ============================= test session starts ============================== 2025-12-04T10:13:48.9153724Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.9153822Z cachedir: .pytest_cache 2025-12-04T10:13:48.9154279Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.9154409Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.9154505Z configfile: pytest.ini 2025-12-04T10:13:48.9154979Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.9155168Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.9155842Z stepcurrent: skipping 31 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9155938Z Running 1 items in this shard 2025-12-04T10:13:48.9155943Z 2025-12-04T10:13:48.9156871Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 10:12:47.309000 98642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 98694 2025-12-04T10:13:48.9157313Z I1204 10:12:47.310000 98642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 98695 2025-12-04T10:13:48.9157750Z I1204 10:12:47.311000 98642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 98696 2025-12-04T10:13:48.9158189Z I1204 10:12:47.312000 98642 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 98697 2025-12-04T10:13:48.9159316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9159432Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9160522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9160641Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9162480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9162681Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9164109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9164224Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9166123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9166216Z _warn_cpu_init() 2025-12-04T10:13:48.9168146Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9168236Z _warn_cpu_init() 2025-12-04T10:13:48.9170155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9170246Z _warn_cpu_init() 2025-12-04T10:13:48.9172120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T10:13:48.9172211Z _warn_cpu_init() 2025-12-04T10:13:48.9173137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T10:13:48.9173341Z return func(*args, **kwargs) 2025-12-04T10:13:48.9173987Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9174524Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9175522Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9176023Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9177053Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9177450Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9178412Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9179088Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9180052Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9180536Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9181488Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9182004Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9182966Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9183452Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9185154Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T10:13:48.9185526Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9186180Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9187299Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9187662Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9188378Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9188965Z [rank0]:E1204 10:12:59.217000 98694 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.9189413Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9189941Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9191003Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9191481Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9192353Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9192703Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9193553Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9193980Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9194834Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9195261Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9196125Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9196520Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9197372Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9197837Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9199305Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T10:13:48.9199629Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9200210Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9201211Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9201530Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9202211Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9202691Z [rank2]:E1204 10:12:59.217000 98696 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.9203083Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9203551Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9204473Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9204922Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9205795Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9206143Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9206996Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9207428Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9208283Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9208742Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9209590Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9209986Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9210862Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9211296Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9212943Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T10:13:48.9213361Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9214168Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9215293Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9215698Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9216409Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9216956Z [rank3]:E1204 10:12:59.218000 98697 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.9217404Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9217968Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9218965Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9219470Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9220459Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9220852Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9221814Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9222295Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9223277Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9223766Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9224716Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9225200Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9226348Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9226816Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9228462Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.9228789Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9229366Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9230386Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9230707Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9231332Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9231817Z [rank1]:E1204 10:12:59.218000 98695 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.9231904Z dist init r=1, world=4 2025-12-04T10:13:48.9232018Z dist init r=0, world=4 2025-12-04T10:13:48.9232110Z dist init r=3, world=4 2025-12-04T10:13:48.9232194Z dist init r=2, world=4 2025-12-04T10:13:48.9233218Z [rank0]:[W1204 10:12:59.733860681 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T10:13:48.9233306Z FAILED [14.3154s] [100%] 2025-12-04T10:13:48.9233312Z 2025-12-04T10:13:48.9233437Z =================================== FAILURES =================================== 2025-12-04T10:13:48.9233715Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____ 2025-12-04T10:13:48.9233821Z Traceback (most recent call last): 2025-12-04T10:13:48.9234300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.9234402Z self._join_processes(fn) 2025-12-04T10:13:48.9234912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.9235043Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.9235596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.9235692Z raise RuntimeError(error) 2025-12-04T10:13:48.9235902Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.9236008Z Traceback (most recent call last): 2025-12-04T10:13:48.9236485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9236608Z getattr(self, test_name)() 2025-12-04T10:13:48.9237077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9237160Z fn() 2025-12-04T10:13:48.9237606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9237696Z method(*args, **kwargs) 2025-12-04T10:13:48.9238146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9238234Z method(*args, **kwargs) 2025-12-04T10:13:48.9238678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9238761Z with policy(): 2025-12-04T10:13:48.9239210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9239310Z raise RuntimeError(msg) 2025-12-04T10:13:48.9240383Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.9240487Z 2025-12-04T10:13:48.9240679Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9241275Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9241281Z 2025-12-04T10:13:48.9241509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9241515Z 2025-12-04T10:13:48.9241526Z 2025-12-04T10:13:48.9241718Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.9241948Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.9242698Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ffec29f281535337.xml - 2025-12-04T10:13:48.9242848Z =========================== short test summary info ============================ 2025-12-04T10:13:48.9243595Z FAILED [14.3154s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.9243707Z Traceback (most recent call last): 2025-12-04T10:13:48.9244189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9244291Z getattr(self, test_name)() 2025-12-04T10:13:48.9244762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9244839Z fn() 2025-12-04T10:13:48.9245294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9245388Z method(*args, **kwargs) 2025-12-04T10:13:48.9245840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9245932Z method(*args, **kwargs) 2025-12-04T10:13:48.9246396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9246487Z with policy(): 2025-12-04T10:13:48.9246932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9247026Z raise RuntimeError(msg) 2025-12-04T10:13:48.9248095Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T10:13:48.9248127Z 2025-12-04T10:13:48.9248315Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9248923Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9248928Z 2025-12-04T10:13:48.9249161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9249315Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.9249474Z ====================== 1 failed, 32 deselected in 14.53s ======================= 2025-12-04T10:13:48.9249557Z Got exit code 1 2025-12-04T10:13:48.9250088Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T10:13:48.9250448Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.9250994Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-878c5ded0afd20c5.xml 2025-12-04T10:13:48.9251182Z ============================= test session starts ============================== 2025-12-04T10:13:48.9251489Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.9251586Z cachedir: .pytest_cache 2025-12-04T10:13:48.9252037Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.9252143Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.9252239Z configfile: pytest.ini 2025-12-04T10:13:48.9252710Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.9252902Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.9253029Z stepcurrent: skipping 32 already run items. 2025-12-04T10:13:48.9253149Z Running 1 items in this shard 2025-12-04T10:13:48.9253156Z 2025-12-04T10:13:48.9254361Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda I1204 10:13:06.339000 98979 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 99031 2025-12-04T10:13:48.9254860Z I1204 10:13:06.340000 98979 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 99032 2025-12-04T10:13:48.9255344Z I1204 10:13:06.341000 98979 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 99033 2025-12-04T10:13:48.9255834Z I1204 10:13:06.342000 98979 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 99034 2025-12-04T10:13:48.9257085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9257216Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9258479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9258606Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9259596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9259799Z {} 2025-12-04T10:13:48.9260125Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9260341Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9262052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9262217Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9263201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9263381Z {} 2025-12-04T10:13:48.9263695Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9263914Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9265608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9265802Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9266996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9267104Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9268218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9268327Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9269207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9269359Z {} 2025-12-04T10:13:48.9269640Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9269835Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9271339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9271513Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9272388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9272537Z {} 2025-12-04T10:13:48.9272826Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9273042Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9274556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9274700Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9275111Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9275582Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9276460Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9276914Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9277816Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9278169Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9279353Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9279850Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9280861Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9281343Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9282302Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9282745Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9283708Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9284195Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9285856Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 714014720 and is now 747569152. 2025-12-04T10:13:48.9286220Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9286877Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9288009Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9288365Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9289084Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9289623Z [rank0]:E1204 10:13:13.370000 99031 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.9290079Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9290608Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9291682Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9292171Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9293045Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9293455Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9294561Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9295078Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9296035Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9296520Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9297479Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9297917Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9298886Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9299372Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9301022Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 638517248. 2025-12-04T10:13:48.9301383Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9302082Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9303176Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9303539Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9304253Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9304792Z [rank1]:E1204 10:13:13.370000 99032 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.9305244Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9305783Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9306780Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9307260Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9308130Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9308486Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9309357Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9309785Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9310634Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9311059Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9311908Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9312304Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9313182Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9313615Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9315053Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 609157120 and is now 638517248. 2025-12-04T10:13:48.9315580Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9316514Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9318415Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9319035Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9320287Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9321221Z [rank3]:E1204 10:13:13.371000 99034 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.9321929Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9322891Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9324563Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9325418Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9327344Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9328171Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9329965Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9330800Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9332545Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9333496Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9335510Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9336357Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9338388Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9339347Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9342378Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 611254272 and is now 638517248. 2025-12-04T10:13:48.9343163Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9344351Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9346551Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9347169Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9348396Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9349286Z [rank2]:E1204 10:13:13.371000 99033 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.9349551Z dist init r=2, world=4 2025-12-04T10:13:48.9349725Z dist init r=1, world=4 2025-12-04T10:13:48.9349881Z dist init r=0, world=4 2025-12-04T10:13:48.9350025Z dist init r=3, world=4 2025-12-04T10:13:48.9350181Z FAILED [8.9186s] [100%] 2025-12-04T10:13:48.9350191Z 2025-12-04T10:13:48.9350409Z =================================== FAILURES =================================== 2025-12-04T10:13:48.9350899Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda _______ 2025-12-04T10:13:48.9351096Z Traceback (most recent call last): 2025-12-04T10:13:48.9352037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.9352228Z self._join_processes(fn) 2025-12-04T10:13:48.9353233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.9353461Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.9354476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.9354656Z raise RuntimeError(error) 2025-12-04T10:13:48.9354977Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.9355146Z Traceback (most recent call last): 2025-12-04T10:13:48.9355702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9355807Z getattr(self, test_name)() 2025-12-04T10:13:48.9356284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9356366Z fn() 2025-12-04T10:13:48.9356810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9356901Z method(*args, **kwargs) 2025-12-04T10:13:48.9357416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9357507Z method(*args, **kwargs) 2025-12-04T10:13:48.9357948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9358039Z with policy(): 2025-12-04T10:13:48.9358484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9358619Z raise RuntimeError(msg) 2025-12-04T10:13:48.9359864Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 638517248. 2025-12-04T10:13:48.9359872Z 2025-12-04T10:13:48.9360075Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9360687Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9360693Z 2025-12-04T10:13:48.9360940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9360946Z 2025-12-04T10:13:48.9360950Z 2025-12-04T10:13:48.9361164Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.9361405Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.9362167Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-878c5ded0afd20c5.xml - 2025-12-04T10:13:48.9362325Z =========================== short test summary info ============================ 2025-12-04T10:13:48.9363081Z FAILED [8.9186s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T10:13:48.9363232Z Traceback (most recent call last): 2025-12-04T10:13:48.9363743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9363853Z getattr(self, test_name)() 2025-12-04T10:13:48.9364351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9364438Z fn() 2025-12-04T10:13:48.9364921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9365018Z method(*args, **kwargs) 2025-12-04T10:13:48.9365519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9365624Z method(*args, **kwargs) 2025-12-04T10:13:48.9366094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9366195Z with policy(): 2025-12-04T10:13:48.9366667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9366766Z raise RuntimeError(msg) 2025-12-04T10:13:48.9367933Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 638517248. 2025-12-04T10:13:48.9367941Z 2025-12-04T10:13:48.9368128Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9368701Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9368709Z 2025-12-04T10:13:48.9368941Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9369139Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.9369303Z ======================= 1 failed, 32 deselected in 9.14s ======================= 2025-12-04T10:13:48.9369388Z Got exit code 1 2025-12-04T10:13:48.9369479Z Retrying single test... 2025-12-04T10:13:48.9370031Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c1c7d4a809089a1.xml 2025-12-04T10:13:48.9370199Z ============================= test session starts ============================== 2025-12-04T10:13:48.9370512Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.9370610Z cachedir: .pytest_cache 2025-12-04T10:13:48.9371068Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.9371189Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.9371283Z configfile: pytest.ini 2025-12-04T10:13:48.9371762Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.9371952Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.9372584Z stepcurrent: skipping 32 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9372694Z Running 1 items in this shard 2025-12-04T10:13:48.9372699Z 2025-12-04T10:13:48.9373865Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda I1204 10:13:19.990000 99308 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 99360 2025-12-04T10:13:48.9374407Z I1204 10:13:19.991000 99308 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 99361 2025-12-04T10:13:48.9374897Z I1204 10:13:19.991000 99308 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 99362 2025-12-04T10:13:48.9375382Z I1204 10:13:19.992000 99308 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 99363 2025-12-04T10:13:48.9376629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9376754Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9377776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9377952Z {} 2025-12-04T10:13:48.9378271Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9378489Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9380424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9380602Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9381834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9382043Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9383031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9383199Z {} 2025-12-04T10:13:48.9383522Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9383777Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9385487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9385653Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9386883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9387014Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9388004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9388180Z {} 2025-12-04T10:13:48.9388493Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9388745Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9390566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9390828Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9391960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9392070Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9392952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9393102Z {} 2025-12-04T10:13:48.9393382Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9393578Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9395088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9395240Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9395647Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9396147Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9397037Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9397485Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9398670Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9399019Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9399874Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9400303Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9401145Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9401588Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9402431Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9402859Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9403707Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9404143Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9405605Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 711917568 and is now 747569152. 2025-12-04T10:13:48.9405935Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9406516Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9407480Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9407808Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9408445Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9408959Z [rank0]:E1204 10:13:27.079000 99360 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.9409359Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9409826Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9410710Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9411188Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9412063Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9412413Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9413332Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9413949Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9414908Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9415435Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9416390Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9416840Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9417803Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9418341Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9419968Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 609157120 and is now 638517248. 2025-12-04T10:13:48.9420334Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9420994Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9422083Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9422456Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9423201Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9423749Z [rank1]:E1204 10:13:27.079000 99361 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.9424193Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9424721Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9425753Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9426328Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9427210Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9427555Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9428406Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9428838Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9429678Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9430148Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9430999Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9431397Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9432276Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9432721Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9434153Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 607059968 and is now 638517248. 2025-12-04T10:13:48.9434475Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9435069Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9436036Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9436392Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9437023Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9437507Z [rank2]:E1204 10:13:27.080000 99362 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.9437933Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9438400Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9439289Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9439736Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9440618Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9440961Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9441817Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9442269Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9443119Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9443552Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9444394Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9444818Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9445667Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9446110Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9447539Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 604962816 and is now 638517248. 2025-12-04T10:13:48.9447860Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9448448Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9449431Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9449757Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9450385Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9450902Z [rank3]:E1204 10:13:27.081000 99363 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.9450989Z dist init r=3, world=4 2025-12-04T10:13:48.9451075Z dist init r=1, world=4 2025-12-04T10:13:48.9451163Z dist init r=2, world=4 2025-12-04T10:13:48.9451245Z dist init r=0, world=4 2025-12-04T10:13:48.9451330Z FAILED [8.9147s] [100%] 2025-12-04T10:13:48.9451335Z 2025-12-04T10:13:48.9451471Z =================================== FAILURES =================================== 2025-12-04T10:13:48.9451730Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda _______ 2025-12-04T10:13:48.9451837Z Traceback (most recent call last): 2025-12-04T10:13:48.9452348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.9452490Z self._join_processes(fn) 2025-12-04T10:13:48.9453502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.9453881Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.9454866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.9455079Z raise RuntimeError(error) 2025-12-04T10:13:48.9455309Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.9455440Z Traceback (most recent call last): 2025-12-04T10:13:48.9455976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9456082Z getattr(self, test_name)() 2025-12-04T10:13:48.9456629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9456714Z fn() 2025-12-04T10:13:48.9457219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9457327Z method(*args, **kwargs) 2025-12-04T10:13:48.9457856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9457969Z method(*args, **kwargs) 2025-12-04T10:13:48.9458466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9458565Z with policy(): 2025-12-04T10:13:48.9459079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9459181Z raise RuntimeError(msg) 2025-12-04T10:13:48.9460351Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 711917568 and is now 747569152. 2025-12-04T10:13:48.9460371Z 2025-12-04T10:13:48.9460582Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9461227Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9461236Z 2025-12-04T10:13:48.9461509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9461515Z 2025-12-04T10:13:48.9461551Z 2025-12-04T10:13:48.9461770Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.9462040Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.9462836Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c1c7d4a809089a1.xml - 2025-12-04T10:13:48.9463104Z =========================== short test summary info ============================ 2025-12-04T10:13:48.9463917Z FAILED [8.9147s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.9464038Z Traceback (most recent call last): 2025-12-04T10:13:48.9464594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9464707Z getattr(self, test_name)() 2025-12-04T10:13:48.9465238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9465332Z fn() 2025-12-04T10:13:48.9465941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9466041Z method(*args, **kwargs) 2025-12-04T10:13:48.9466486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9466581Z method(*args, **kwargs) 2025-12-04T10:13:48.9467032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9467166Z with policy(): 2025-12-04T10:13:48.9467612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9467717Z raise RuntimeError(msg) 2025-12-04T10:13:48.9468755Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 711917568 and is now 747569152. 2025-12-04T10:13:48.9468760Z 2025-12-04T10:13:48.9468955Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9469519Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9469524Z 2025-12-04T10:13:48.9469782Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9469952Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.9470106Z ======================= 1 failed, 32 deselected in 9.13s ======================= 2025-12-04T10:13:48.9470199Z Got exit code 1 2025-12-04T10:13:48.9470291Z Retrying single test... 2025-12-04T10:13:48.9470841Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-78a9e7f81c58cb5d.xml 2025-12-04T10:13:48.9470987Z ============================= test session starts ============================== 2025-12-04T10:13:48.9471290Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.9471391Z cachedir: .pytest_cache 2025-12-04T10:13:48.9471843Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.9471951Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.9472051Z configfile: pytest.ini 2025-12-04T10:13:48.9472525Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.9472740Z collecting ... collected 60 items / 32 deselected / 28 selected 2025-12-04T10:13:48.9473384Z stepcurrent: skipping 32 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9473482Z Running 1 items in this shard 2025-12-04T10:13:48.9473487Z 2025-12-04T10:13:48.9474376Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda I1204 10:13:33.669000 99637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 99689 2025-12-04T10:13:48.9474843Z I1204 10:13:33.670000 99637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 99690 2025-12-04T10:13:48.9475278Z I1204 10:13:33.671000 99637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 99691 2025-12-04T10:13:48.9475718Z I1204 10:13:33.672000 99637 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 99692 2025-12-04T10:13:48.9476809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9476925Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9477801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9477972Z {} 2025-12-04T10:13:48.9478252Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9478467Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9480490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9480652Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9481958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9482084Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9483073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9483254Z {} 2025-12-04T10:13:48.9483570Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9483787Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9485488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9485658Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9486933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9487062Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9488049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9488260Z {} 2025-12-04T10:13:48.9488581Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9488796Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9490498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9490668Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9492051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T10:13:48.9492170Z self.encoder = TransformerEncoder( 2025-12-04T10:13:48.9493048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T10:13:48.9493297Z {} 2025-12-04T10:13:48.9493588Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T10:13:48.9493962Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T10:13:48.9495672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T10:13:48.9495837Z device_from_device_id = _get_device_from_device_id( 2025-12-04T10:13:48.9496337Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9496877Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9497879Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9498393Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9499376Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9499782Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9500739Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9501260Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9502216Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9502698Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9503686Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9504123Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9505090Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9505682Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9507268Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 707723264 and is now 747569152. 2025-12-04T10:13:48.9507592Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9508201Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9509175Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9509495Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9510134Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9510638Z [rank0]:E1204 10:13:40.731000 99689 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T10:13:48.9511045Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9511513Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9512398Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9512852Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9513726Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9514078Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9514951Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9515381Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9516234Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9516698Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9517546Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9517937Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9518792Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9519220Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9520659Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 607059968 and is now 638517248. 2025-12-04T10:13:48.9521006Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9521587Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9522546Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9522865Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9523528Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9524011Z [rank2]:E1204 10:13:40.733000 99691 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T10:13:48.9524410Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9524877Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9525759Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9526216Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9527085Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9527464Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9528308Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9528732Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9529616Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9530042Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9530893Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9531280Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9532128Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9532560Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9534281Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 604962816 and is now 638517248. 2025-12-04T10:13:48.9534679Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9535327Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9536450Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9536809Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9537529Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9538068Z [rank1]:E1204 10:13:40.734000 99690 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T10:13:48.9538515Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T10:13:48.9539040Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T10:13:48.9540036Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9540540Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T10:13:48.9541556Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9541951Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T10:13:48.9542906Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9543415Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9544369Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9544853Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T10:13:48.9545910Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9546409Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T10:13:48.9547262Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9547687Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T10:13:48.9549144Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 609157120 and is now 638517248. 2025-12-04T10:13:48.9549467Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9550050Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9551040Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9551361Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T10:13:48.9551991Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9552467Z [rank3]:E1204 10:13:40.734000 99692 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T10:13:48.9552553Z dist init r=0, world=4 2025-12-04T10:13:48.9552647Z dist init r=3, world=4 2025-12-04T10:13:48.9552732Z dist init r=1, world=4 2025-12-04T10:13:48.9552816Z dist init r=2, world=4 2025-12-04T10:13:48.9552902Z FAILED [8.8443s] [100%] 2025-12-04T10:13:48.9552907Z 2025-12-04T10:13:48.9553037Z =================================== FAILURES =================================== 2025-12-04T10:13:48.9553300Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda _______ 2025-12-04T10:13:48.9553402Z Traceback (most recent call last): 2025-12-04T10:13:48.9553905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T10:13:48.9554009Z self._join_processes(fn) 2025-12-04T10:13:48.9554522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T10:13:48.9554652Z self._check_return_codes(fn, elapsed_time) 2025-12-04T10:13:48.9555208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T10:13:48.9555305Z raise RuntimeError(error) 2025-12-04T10:13:48.9555517Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.9555619Z Traceback (most recent call last): 2025-12-04T10:13:48.9556095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9556198Z getattr(self, test_name)() 2025-12-04T10:13:48.9556665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9556747Z fn() 2025-12-04T10:13:48.9557191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9557282Z method(*args, **kwargs) 2025-12-04T10:13:48.9557733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9557825Z method(*args, **kwargs) 2025-12-04T10:13:48.9558265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9558359Z with policy(): 2025-12-04T10:13:48.9558834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9558935Z raise RuntimeError(msg) 2025-12-04T10:13:48.9559967Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 707723264 and is now 747569152. 2025-12-04T10:13:48.9559973Z 2025-12-04T10:13:48.9560162Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9560733Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9560738Z 2025-12-04T10:13:48.9560994Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9561000Z 2025-12-04T10:13:48.9561005Z 2025-12-04T10:13:48.9561208Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:13:48.9561439Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T10:13:48.9562148Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-78a9e7f81c58cb5d.xml - 2025-12-04T10:13:48.9562297Z =========================== short test summary info ============================ 2025-12-04T10:13:48.9563005Z FAILED [8.8443s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T10:13:48.9563117Z Traceback (most recent call last): 2025-12-04T10:13:48.9563602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T10:13:48.9563699Z getattr(self, test_name)() 2025-12-04T10:13:48.9564174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T10:13:48.9564251Z fn() 2025-12-04T10:13:48.9564734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9564824Z method(*args, **kwargs) 2025-12-04T10:13:48.9565265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:13:48.9565360Z method(*args, **kwargs) 2025-12-04T10:13:48.9565799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:13:48.9565915Z with policy(): 2025-12-04T10:13:48.9566363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:13:48.9566457Z raise RuntimeError(msg) 2025-12-04T10:13:48.9567506Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 707723264 and is now 747569152. 2025-12-04T10:13:48.9567511Z 2025-12-04T10:13:48.9567698Z To execute this test, run the following from the base repo dir: 2025-12-04T10:13:48.9568268Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9568273Z 2025-12-04T10:13:48.9568507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:13:48.9568660Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:13:48.9568817Z ======================= 1 failed, 32 deselected in 9.06s ======================= 2025-12-04T10:13:48.9568897Z Got exit code 1 2025-12-04T10:13:48.9569411Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda 2025-12-04T10:13:48.9569777Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:13:48.9570320Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6f57499824dd1125.xml 2025-12-04T10:13:48.9570465Z ============================= test session starts ============================== 2025-12-04T10:13:48.9570766Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:13:48.9570857Z cachedir: .pytest_cache 2025-12-04T10:13:48.9571311Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:13:48.9571441Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:13:48.9571538Z configfile: pytest.ini 2025-12-04T10:13:48.9572004Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:13:48.9572190Z collecting ... collected 60 items / 33 deselected / 27 selected 2025-12-04T10:13:48.9572319Z stepcurrent: skipping 33 already run items. 2025-12-04T10:13:48.9572412Z Running 0 items in this shard 2025-12-04T10:13:48.9572417Z 2025-12-04T10:13:48.9573121Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6f57499824dd1125.xml - 2025-12-04T10:13:48.9573334Z ============================ 33 deselected in 0.02s ============================ 2025-12-04T10:13:48.9593674Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda'] 2025-12-04T10:13:48.9593772Z 2025-12-04T10:13:48.9594317Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 1/2 (test/test-reports/distributed.fsdp.test_fsdp_core_1.2_d577d9d07b48d18d_.log) 2025-12-04T10:13:48.9594324Z 2025-12-04T10:13:48.9594680Z Finished distributed/fsdp/test_fsdp_core 1/2 ... [2025-12-04 10:13:47.590264][4853.692389325], took 45.15min 2025-12-04T10:13:48.9595436Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4e48aa8d10589348.xml 2025-12-04T10:13:48.9596180Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3193e57821c2ebca.xml 2025-12-04T10:13:48.9596944Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9a7469c5b46925c2.xml 2025-12-04T10:13:48.9597739Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-227ae9e59104394c.xml 2025-12-04T10:13:48.9598481Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-38750597d70d3b79.xml 2025-12-04T10:13:48.9599230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8d9e40030a96f20.xml 2025-12-04T10:13:48.9599976Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-257466dde9fb107b.xml 2025-12-04T10:13:48.9600725Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-94f5dd2e01869af2.xml 2025-12-04T10:13:48.9601463Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-42c522b0340c97ac.xml 2025-12-04T10:13:48.9602242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-32205b0cc860e51d.xml 2025-12-04T10:13:48.9602978Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e08ad6962badbec0.xml 2025-12-04T10:13:48.9603721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2ea67fcde569130f.xml 2025-12-04T10:13:48.9604470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2525c9886ebe84d6.xml 2025-12-04T10:13:48.9605240Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-265cb7987b98bd4a.xml 2025-12-04T10:13:48.9609631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0dad56460685f27c.xml 2025-12-04T10:13:48.9610428Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0a57eff14e5fabd3.xml 2025-12-04T10:13:48.9611192Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f9981ec75d7ffd49.xml 2025-12-04T10:13:48.9611930Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9f763e8043031072.xml 2025-12-04T10:13:48.9612679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-02dc996acd3ff226.xml 2025-12-04T10:13:48.9613583Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a9525f3a3720890d.xml 2025-12-04T10:13:48.9614595Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0add77cc4faf0004.xml 2025-12-04T10:13:48.9615441Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb044786f28290de.xml 2025-12-04T10:13:48.9616309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3368174cbc9b5a.xml 2025-12-04T10:13:48.9617155Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48241fc1d70a928.xml 2025-12-04T10:13:48.9617988Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4808bec29186d3a1.xml 2025-12-04T10:13:48.9618825Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0fe450ea21eea83e.xml 2025-12-04T10:13:48.9619670Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4119ccbda03fb8bd.xml 2025-12-04T10:13:48.9620504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77506912df9607dd.xml 2025-12-04T10:13:48.9621339Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-71967038f6397bcb.xml 2025-12-04T10:13:48.9622206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7aea602ded691711.xml 2025-12-04T10:13:48.9623046Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-588db22c786ffc0c.xml 2025-12-04T10:13:48.9623883Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d2c93dc13a89050c.xml 2025-12-04T10:13:48.9624774Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e269a47641789945.xml 2025-12-04T10:13:48.9625617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d0e2108c889b6f40.xml 2025-12-04T10:13:48.9626487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d7cc16231ece4156.xml 2025-12-04T10:13:48.9627232Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-01da52837cd28026.xml 2025-12-04T10:13:48.9627977Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa4201c32172891c.xml 2025-12-04T10:13:48.9628722Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4da7b120579aed6b.xml 2025-12-04T10:13:48.9717501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8c1fa5c204db7919.xml 2025-12-04T10:13:49.0009036Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-79d8c8140e8d4a45.xml 2025-12-04T10:13:49.0343183Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8cddc87d4e2da4.xml 2025-12-04T10:13:49.0666301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-312ddbdab57572f7.xml 2025-12-04T10:13:49.0975977Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb1563559edf316c.xml 2025-12-04T10:13:49.1228980Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-63a7a52cd3aa8936.xml 2025-12-04T10:13:49.1516634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a6324f00d63e140d.xml 2025-12-04T10:13:49.1827186Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cab4f0cffa47b1f.xml 2025-12-04T10:13:49.2168576Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a997c4f2b1c679bc.xml 2025-12-04T10:13:49.2478245Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb42278badc3bd05.xml 2025-12-04T10:13:49.2767919Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e66930a4930311d.xml 2025-12-04T10:13:49.3077434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-caca850dfa53af0d.xml 2025-12-04T10:13:49.3367567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5ab6f8f72e0857a0.xml 2025-12-04T10:13:49.3667687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82a2bf200d1dcaa2.xml 2025-12-04T10:13:49.3977192Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a014b9bd1b37d049.xml 2025-12-04T10:13:49.4296522Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-95f17e90ca4b9755.xml 2025-12-04T10:13:49.4628551Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7ff8c73ee302c339.xml 2025-12-04T10:13:49.4908987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cb6a3efe573e986.xml 2025-12-04T10:13:49.5336820Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-44c122860a547cf4.xml 2025-12-04T10:13:49.5765793Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-668210e8e09c8dd9.xml 2025-12-04T10:13:49.6057576Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9448ec7a0a61b5a6.xml 2025-12-04T10:13:49.6367605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8d6fd75ad2c1f260.xml 2025-12-04T10:13:49.6645978Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa2eb835ecdd4375.xml 2025-12-04T10:13:49.7100932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e3a192bec2a8308.xml 2025-12-04T10:13:49.7368346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c427df5212a82823.xml 2025-12-04T10:13:49.7647300Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a459d6ece2e0d396.xml 2025-12-04T10:13:49.7975873Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82bd08cbda0b3168.xml 2025-12-04T10:13:49.8307248Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cb7d608d20fa1845.xml 2025-12-04T10:13:49.8608806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3bf4168a6952dca5.xml 2025-12-04T10:13:49.8908161Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-69250cff44e166fa.xml 2025-12-04T10:13:49.9183197Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-311a7d97c78eb59e.xml 2025-12-04T10:13:49.9496773Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5f502e67619c39f3.xml 2025-12-04T10:13:49.9846689Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8eee5abee9febb4.xml 2025-12-04T10:13:50.0157364Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-acff5684d72dd2d3.xml 2025-12-04T10:13:50.0496429Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-72a03384c8cb338e.xml 2025-12-04T10:13:50.0758592Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-efc6da476f35386f.xml 2025-12-04T10:13:50.1069383Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ecdb0e90ac1c2bc1.xml 2025-12-04T10:13:50.1349444Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9524e9df873b8be0.xml 2025-12-04T10:13:50.1654036Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ad9e0258dc223929.xml 2025-12-04T10:13:50.1959065Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-17a67113f8be5d53.xml 2025-12-04T10:13:50.2287591Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c0681c30ea8c1a74.xml 2025-12-04T10:13:50.2596263Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-27fc2bea2cad5f2f.xml 2025-12-04T10:13:50.2870390Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-834db467a2bb808c.xml 2025-12-04T10:13:50.3158891Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48f6c290350ef90.xml 2025-12-04T10:13:50.3477097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1ee2f71fb8de6413.xml 2025-12-04T10:13:50.3807245Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7f8857b703d650c5.xml 2025-12-04T10:13:50.4114927Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19e11b35947b0a14.xml 2025-12-04T10:13:50.4444831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b5d6f54eb5c8ad3.xml 2025-12-04T10:13:50.4790561Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b2eb1b61ddd90ac8.xml 2025-12-04T10:13:50.5091186Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e6810d3a1c38013d.xml 2025-12-04T10:13:50.5396066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-53dff883b0afb17e.xml 2025-12-04T10:13:50.5695733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c92b48ae22d6a39.xml 2025-12-04T10:13:50.6036703Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb06b260fb006313.xml 2025-12-04T10:13:50.6346233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29ffb7b96244526a.xml 2025-12-04T10:13:50.6655351Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-64a83fa5a2cd03db.xml 2025-12-04T10:13:50.6975745Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1d1edf2996f09e22.xml 2025-12-04T10:13:50.7327281Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d210f9114da1dd7.xml 2025-12-04T10:13:50.7656575Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ffec29f281535337.xml 2025-12-04T10:13:50.7959422Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-878c5ded0afd20c5.xml 2025-12-04T10:13:50.8249487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c1c7d4a809089a1.xml 2025-12-04T10:13:50.8571742Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-78a9e7f81c58cb5d.xml 2025-12-04T10:13:50.8849082Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6f57499824dd1125.xml 2025-12-04T10:13:51.4009837Z Uploading logs for 57116084892 to S3 2025-12-04T10:13:51.5079700Z Uploading artifacts took 0.60 seconds 2025-12-04T10:13:51.5080410Z distributed/fsdp/test_fsdp_core 1/2 failed! 2025-12-04T10:13:51.5082717Z Running distributed/algorithms/ddp_comm_hooks/test_ddp_hooks 1/1 ... [2025-12-04 10:13:51.508066][4857.610193741] 2025-12-04T10:13:51.5083934Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:13:51.5092625Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:13:51.508604] 2025-12-04T10:14:27.5640216Z 2025-12-04T10:14:27.5641680Z distributed/algorithms/ddp_comm_hooks/test_ddp_hooks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks_1.1_60114f563500ace9_.log 2025-12-04T10:14:27.5646931Z Running 6 items in this shard: test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py::DistributedDataParallelCommHookTest::test_ddp_comm_hook_allreduce_hook, test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py::DistributedDataParallelCommHookTest::test_ddp_comm_hook_fp16compress_hook, test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py::DistributedDataParallelCommHookTest::test_ddp_comm_hook_noop_hook, test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py::DistributedDataParallelCommHookTest::test_ddp_comm_hook_quantize_per_channel_hook, test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py::DistributedDataParallelCommHookTest::test_ddp_comm_hook_quantize_per_tensor_hook, test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py::DistributedDataParallelCommHookTest::test_is_last_hook 2025-12-04T10:14:27.5651544Z 2025-12-04T10:14:27.5652018Z Finished distributed/algorithms/ddp_comm_hooks/test_ddp_hooks 1/1 ... [2025-12-04 10:14:27.563524][4893.665649439], took 0.60min 2025-12-04T10:14:27.5844498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks-671b5e8f8d643201.xml 2025-12-04T10:14:27.6717698Z Running distributed/tensor/test_op_schema 1/1 ... [2025-12-04 10:14:27.671129][4893.773261074] 2025-12-04T10:14:27.6718326Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:14:31.4944497Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_op_schema.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:27.671487] 2025-12-04T10:14:31.4945801Z 2025-12-04T10:14:31.4946571Z distributed/tensor/test_op_schema 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_op_schema_1.1_d011062119cfcbab_.log 2025-12-04T10:14:31.4948434Z Running 2 items in this shard: test/distributed/tensor/test_op_schema.py::TestOpSchema::test_equality_checks_lists_of_dtensor_spec, test/distributed/tensor/test_op_schema.py::TestOpSchema::test_equality_respects_static_attributes 2025-12-04T10:14:31.4949576Z 2025-12-04T10:14:31.4950069Z Finished distributed/tensor/test_op_schema 1/1 ... [2025-12-04 10:14:31.493893][4897.596022895], took 0.06min 2025-12-04T10:14:31.5138529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_op_schema/distributed.tensor.test_op_schema-bb5a16ac0960925a.xml 2025-12-04T10:14:31.5507365Z Running distributed/checkpoint/test_nested_dict 1/1 ... [2025-12-04 10:14:31.550016][4897.652147287] 2025-12-04T10:14:31.5508042Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:14:31.5509324Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_nested_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:31.550369] 2025-12-04T10:14:35.4248233Z 2025-12-04T10:14:35.4249423Z distributed/checkpoint/test_nested_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_nested_dict_1.1_997e08555820f5c0_.log 2025-12-04T10:14:35.4251328Z Running 2 items in this shard: test/distributed/checkpoint/test_nested_dict.py::TestFlattening::test_flattening_round_trip, test/distributed/checkpoint/test_nested_dict.py::TestFlattening::test_mapping 2025-12-04T10:14:35.4252366Z 2025-12-04T10:14:35.4252788Z Finished distributed/checkpoint/test_nested_dict 1/1 ... [2025-12-04 10:14:35.424300][4901.5264259], took 0.06min 2025-12-04T10:14:35.4446198Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_nested_dict/distributed.checkpoint.test_nested_dict-81f92522f1154383.xml 2025-12-04T10:14:35.4895334Z Running distributed/checkpoint/test_consolidate_hf_safetensors 1/1 ... [2025-12-04 10:14:35.488934][4901.591065552] 2025-12-04T10:14:35.4896114Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:14:35.4897518Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_consolidate_hf_safetensors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:35.489284] 2025-12-04T10:15:16.0557659Z 2025-12-04T10:15:16.0559928Z distributed/checkpoint/test_consolidate_hf_safetensors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_consolidate_hf_safetensors_1.1_58298622ce8a96a1_.log 2025-12-04T10:15:16.0572093Z Running 7 items in this shard: test/distributed/checkpoint/test_consolidate_hf_safetensors.py::TestConsolidateHFSafeTensors::test_calculate_max_contiguous_elements_valid_cases, test/distributed/checkpoint/test_consolidate_hf_safetensors.py::TestConsolidateHFSafeTensors::test_calculate_max_contiguous_elements_validations, test/distributed/checkpoint/test_consolidate_hf_safetensors.py::TestConsolidateHFSafeTensors::test_consolidate_one_file_with_two_ranks, test/distributed/checkpoint/test_consolidate_hf_safetensors.py::TestConsolidateHFSafeTensors::test_consolidate_to_one_file, test/distributed/checkpoint/test_consolidate_hf_safetensors.py::TestConsolidateHFSafeTensors::test_consolidate_to_two_files, test/distributed/checkpoint/test_consolidate_hf_safetensors.py::TestConsolidateHFSafeTensors::test_consolidate_with_two_ranks, test/distributed/checkpoint/test_consolidate_hf_safetensors.py::TestConsolidateHFSafeTensors::test_write_sub_tensor_to_file_optimized 2025-12-04T10:15:16.0581926Z 2025-12-04T10:15:16.0582974Z Finished distributed/checkpoint/test_consolidate_hf_safetensors 1/1 ... [2025-12-04 10:15:16.055422][4942.157552793], took 0.68min 2025-12-04T10:15:16.0775256Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_consolidate_hf_safetensors/distributed.checkpoint.test_consolidate_hf_safetensors-d914312b5a4148e2.xml 2025-12-04T10:15:16.1770579Z Running distributed/checkpoint/_experimental/test_barriers 1/1 ... [2025-12-04 10:15:16.176576][4942.278706467] 2025-12-04T10:15:16.1771319Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:16.1772885Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_barriers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:16.176917] 2025-12-04T10:15:20.0512942Z 2025-12-04T10:15:20.0514242Z distributed/checkpoint/_experimental/test_barriers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_barriers_1.1_b9cec98a3229522e_.log 2025-12-04T10:15:20.0516542Z Running 2 items in this shard: test/distributed/checkpoint/_experimental/test_barriers.py::TestBarriers::test_execute_barrier, test/distributed/checkpoint/_experimental/test_barriers.py::TestBarriers::test_tcpstore_barrier_initialization 2025-12-04T10:15:20.0518237Z 2025-12-04T10:15:20.0518708Z Finished distributed/checkpoint/_experimental/test_barriers 1/1 ... [2025-12-04 10:15:20.050759][4946.152890101], took 0.06min 2025-12-04T10:15:20.0712935Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint._experimental.test_barriers/distributed.checkpoint._experimental.test_barriers-d8f5a49da0f436d9.xml 2025-12-04T10:15:20.1176986Z Running distributed/pipelining/test_transformer 1/1 ... [2025-12-04 10:15:20.117473][4946.21960463] 2025-12-04T10:15:20.1177657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:20.1180479Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_transformer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:20.117829] 2025-12-04T10:15:25.8467993Z 2025-12-04T10:15:25.8470033Z distributed/pipelining/test_transformer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_transformer_1.1_f2f1238de8a1b675_.log 2025-12-04T10:15:25.8471535Z Running 1 items in this shard: test/distributed/pipelining/test_transformer.py::TransformerTestsCUDA::test_ir_cuda 2025-12-04T10:15:25.8472415Z 2025-12-04T10:15:25.8472861Z Finished distributed/pipelining/test_transformer 1/1 ... [2025-12-04 10:15:25.846315][4951.94844069], took 0.10min 2025-12-04T10:15:25.8665844Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_transformer/distributed.pipelining.test_transformer-e70c997724b03d0e.xml 2025-12-04T10:15:25.9269871Z Running distributed/flight_recorder/test_fr_analysis 1/1 ... [2025-12-04 10:15:25.926489][4952.02861955] 2025-12-04T10:15:25.9270552Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:25.9272014Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/flight_recorder/test_fr_analysis.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:25.926835] 2025-12-04T10:15:29.7008524Z 2025-12-04T10:15:29.7009728Z distributed/flight_recorder/test_fr_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.flight_recorder.test_fr_analysis_1.1_e16708747b4c5449_.log 2025-12-04T10:15:29.7213360Z Running 4 items in this shard: test/distributed/flight_recorder/test_fr_analysis.py::FlightRecorderEventTest::test_all_events, test/distributed/flight_recorder/test_fr_analysis.py::FlightRecorderEventTest::test_match_one_event, test/distributed/flight_recorder/test_fr_analysis.py::FlightMatchInfoTest::test_match_info, test/distributed/flight_recorder/test_fr_analysis.py::FlightRecorderE2ETest::testBuildDB 2025-12-04T10:15:29.7215672Z 2025-12-04T10:15:29.7216143Z Finished distributed/flight_recorder/test_fr_analysis 1/1 ... [2025-12-04 10:15:29.700349][4955.802479418], took 0.06min 2025-12-04T10:15:29.7217753Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.flight_recorder.test_fr_analysis/distributed.flight_recorder.test_fr_analysis-aba4e9f61260e449.xml 2025-12-04T10:15:29.7578519Z Running distributed/_composable/test_contract 1/1 ... [2025-12-04 10:15:29.757618][4955.85974932] 2025-12-04T10:15:29.7579404Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:29.7582038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_contract.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:29.757981] 2025-12-04T10:15:33.5821010Z 2025-12-04T10:15:33.5822436Z distributed/_composable/test_contract 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_contract_1.1_5025aa03747f10c7_.log 2025-12-04T10:15:33.5825297Z Running 5 items in this shard: test/distributed/_composable/test_contract.py::TestContract::test_add_hooks, test/distributed/_composable/test_contract.py::TestContract::test_modify_fqn, test/distributed/_composable/test_contract.py::TestContract::test_multi_module_api, test/distributed/_composable/test_contract.py::TestContract::test_registry, test/distributed/_composable/test_contract.py::TestContract::test_state 2025-12-04T10:15:33.5827337Z 2025-12-04T10:15:33.5827744Z Finished distributed/_composable/test_contract 1/1 ... [2025-12-04 10:15:33.581757][4959.683882167], took 0.06min 2025-12-04T10:15:33.6030952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_contract/distributed._composable.test_contract-43d2ccf9f44c35a5.xml 2025-12-04T10:15:33.6391608Z Running distributed/checkpoint/test_dedup_tensors 1/1 ... [2025-12-04 10:15:33.638643][4959.740774916] 2025-12-04T10:15:33.6392271Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:33.6393568Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_dedup_tensors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:33.638987] 2025-12-04T10:15:37.4631876Z 2025-12-04T10:15:37.4633103Z distributed/checkpoint/test_dedup_tensors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_dedup_tensors_1.1_0e84935618c63fcf_.log 2025-12-04T10:15:37.4634623Z Running 1 items in this shard: test/distributed/checkpoint/test_dedup_tensors.py::TestDedupTensor::test_dedup_shards 2025-12-04T10:15:37.4635228Z 2025-12-04T10:15:37.4635677Z Finished distributed/checkpoint/test_dedup_tensors 1/1 ... [2025-12-04 10:15:37.462763][4963.564893307], took 0.06min 2025-12-04T10:15:37.4840386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_dedup_tensors/distributed.checkpoint.test_dedup_tensors-98db0ae6ec3ef072.xml 2025-12-04T10:15:37.5200064Z Running distributed/pipelining/test_pipe 1/1 ... [2025-12-04 10:15:37.519425][4963.621556544] 2025-12-04T10:15:37.5200680Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:37.5201942Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_pipe.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:37.519774] 2025-12-04T10:15:41.8452822Z 2025-12-04T10:15:41.8454260Z distributed/pipelining/test_pipe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_pipe_1.1_59bc2fd17ffed9fc_.log 2025-12-04T10:15:41.8456699Z Running 3 items in this shard: test/distributed/pipelining/test_pipe.py::PipeTests::test_model_split_ModelClass0, test/distributed/pipelining/test_pipe.py::PipeTests::test_model_split_ModelClass1, test/distributed/pipelining/test_pipe.py::PipeTests::test_model_split_ModelClass2 2025-12-04T10:15:41.8458148Z 2025-12-04T10:15:41.8458522Z Finished distributed/pipelining/test_pipe 1/1 ... [2025-12-04 10:15:41.844836][4967.946965949], took 0.07min 2025-12-04T10:15:41.8659173Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_pipe/distributed.pipelining.test_pipe-b65ad592f97073ad.xml 2025-12-04T10:15:41.9041629Z Running distributed/pipelining/test_backward 1/1 ... [2025-12-04 10:15:41.903627][4968.005757754] 2025-12-04T10:15:41.9042262Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:41.9043552Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_backward.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:41.903979] 2025-12-04T10:15:47.2321964Z 2025-12-04T10:15:47.2323327Z distributed/pipelining/test_backward 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_backward_1.1_0bf2315297ac9ba7_.log 2025-12-04T10:15:47.2327137Z Running 5 items in this shard: test/distributed/pipelining/test_backward.py::StageBackwardTestsCUDA::test_stage_backward_cuda, test/distributed/pipelining/test_backward.py::StageBackwardTestsCUDA::test_stage_backward_input_cuda, test/distributed/pipelining/test_backward.py::StageBackwardTestsCUDA::test_stage_backward_weight_cuda, test/distributed/pipelining/test_backward.py::StageBackwardTestsCUDA::test_stage_backward_weight_grad_validation_cuda, test/distributed/pipelining/test_backward.py::StageBackwardTestsCUDA::test_stage_backward_weight_multiple_iters_cuda 2025-12-04T10:15:47.2330086Z 2025-12-04T10:15:47.2330502Z Finished distributed/pipelining/test_backward 1/1 ... [2025-12-04 10:15:47.231562][4973.333685582], took 0.09min 2025-12-04T10:15:47.2533386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_backward/distributed.pipelining.test_backward-4f9205b7617a9aaf.xml 2025-12-04T10:15:47.2898716Z Running distributed/test_nvshmem_triton 1/1 ... [2025-12-04 10:15:47.289654][4973.391784726] 2025-12-04T10:15:47.2899349Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:47.2901773Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_nvshmem_triton.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:47.289995] 2025-12-04T10:16:09.9606584Z 2025-12-04T10:16:09.9607726Z distributed/test_nvshmem_triton 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_nvshmem_triton_1.1_f614bcc3c29a51a4_.log 2025-12-04T10:16:09.9626259Z Running 37 items in this shard: test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_alltoall, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_barrier, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_broadcast, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_fence, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_get_nbi_False, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_get_nbi_True, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_get_ring, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_bfloat16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_float16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_float32, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_float64, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_int16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_int32, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_int64, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_minmax_reduce_int8, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_prod_reduce_bfloat16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_prod_reduce_float16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_prod_reduce_float32, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_prod_reduce_int16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_prod_reduce_int32, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_prod_reduce_int64, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_prod_reduce_int8, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_put, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_put_signal_add, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_put_signal_set, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_quiet, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_signal_wait_until, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_bfloat16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_float16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_float32, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_int16, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_int32, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_int64, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_int8, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sum_reduce_uint8, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_sync, test/distributed/test_nvshmem_triton.py::NVSHMEMTritonTest::test_triton_wait_until 2025-12-04T10:16:09.9642309Z 2025-12-04T10:16:09.9642681Z Finished distributed/test_nvshmem_triton 1/1 ... [2025-12-04 10:16:09.960192][4996.06232272], took 0.38min 2025-12-04T10:16:09.9816986Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_nvshmem_triton/distributed.test_nvshmem_triton-2d1da825c1a177a7.xml 2025-12-04T10:16:10.0723225Z Running distributed/tensor/test_dtensor 1/1 ... [2025-12-04 10:16:10.071794][4996.173924999] 2025-12-04T10:16:10.0723853Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:16:10.0725095Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dtensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:10.072136] 2025-12-04T10:20:18.7215756Z 2025-12-04T10:20:18.7216862Z distributed/tensor/test_dtensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_1.1_8ef92beff0d7f6af_.log 2025-12-04T10:20:18.7259434Z Running 86 items in this shard: test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_async_output, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_constructor, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_new_empty_strided, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_properties, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_save_load, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_save_load_import, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_spec_hash, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_spec_read_only_after_set, test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_stride, test/distributed/tensor/test_dtensor.py::DTensorTest::test_from_local, test/distributed/tensor/test_dtensor.py::DTensorTest::test_from_local_negative_dim, test/distributed/tensor/test_dtensor.py::DTensorTest::test_from_local_then_to_local, test/distributed/tensor/test_dtensor.py::DTensorTest::test_from_local_uneven_sharding, test/distributed/tensor/test_dtensor.py::DTensorTest::test_from_local_uneven_sharding_raise_error, test/distributed/tensor/test_dtensor.py::DTensorTest::test_full_tensor_grad_hint, test/distributed/tensor/test_dtensor.py::DTensorTest::test_full_tensor_sync, test/distributed/tensor/test_dtensor.py::DTensorTest::test_meta_dtensor, test/distributed/tensor/test_dtensor.py::DTensorTest::test_modules_w_meta_dtensor, test/distributed/tensor/test_dtensor.py::DTensorTest::test_shard_tensor, test/distributed/tensor/test_dtensor.py::DTensorTest::test_shard_tensor_2d, test/distributed/tensor/test_dtensor.py::DTensorTest::test_to_local, test/distributed/tensor/test_dtensor.py::DTensorTest::test_to_local_grad_hint, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_async_output, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_constructor, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_new_empty_strided, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_properties, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_save_load, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_save_load_import, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_spec_hash, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_spec_read_only_after_set, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_stride, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_from_local, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_from_local_negative_dim, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_from_local_then_to_local, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_from_local_uneven_sharding, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_from_local_uneven_sharding_raise_error, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_full_tensor_grad_hint, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_full_tensor_sync, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_meta_dtensor, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_modules_w_meta_dtensor, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_shard_tensor, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_shard_tensor_2d, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_to_local, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_to_local_grad_hint, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_as_strided_identity, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_auto_implicit_replication, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_default_value_sub_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_device_mesh_nd, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_dtensor_2d_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_dtensor_api_device_mesh_context_manager, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_dtensor_cond, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_dtensor_device_mesh_device_conversion, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_dtensor_spec_local_shard_offset, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_from_local_sub_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_implicit_replication, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_inplace_on_local_tensor_view, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_metadata_consistency_check, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_redistribute_sub_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_vmap_embedding, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_as_strided_identity, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_auto_implicit_replication, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_default_value_sub_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_device_mesh_nd, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_dtensor_2d_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_dtensor_api_device_mesh_context_manager, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_dtensor_cond, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_dtensor_device_mesh_device_conversion, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_dtensor_spec_local_shard_offset, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_from_local_sub_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_implicit_replication, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_inplace_on_local_tensor_view, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_metadata_consistency_check, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_redistribute_sub_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_vmap_embedding, test/distributed/tensor/test_dtensor.py::TestDTensorPlacementTypes::test_split_tensor_1D, test/distributed/tensor/test_dtensor.py::TestDTensorPlacementTypesWithLocalTensor::test_split_tensor_1D, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_default_shard_order, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_dtensor_spec_default_shard_order_generation, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_dtensor_spec_print, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_dtensor_spec_update, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_dtensor_spec_with_invalid_shard_order, test/distributed/tensor/test_dtensor.py::TestDTensorSpecWithLocalTensor::test_default_shard_order, test/distributed/tensor/test_dtensor.py::TestDTensorSpecWithLocalTensor::test_dtensor_spec_default_shard_order_generation, test/distributed/tensor/test_dtensor.py::TestDTensorSpecWithLocalTensor::test_dtensor_spec_print, test/distributed/tensor/test_dtensor.py::TestDTensorSpecWithLocalTensor::test_dtensor_spec_update, test/distributed/tensor/test_dtensor.py::TestDTensorSpecWithLocalTensor::test_dtensor_spec_with_invalid_shard_order 2025-12-04T10:20:18.7300185Z 2025-12-04T10:20:18.7300587Z Finished distributed/tensor/test_dtensor 1/1 ... [2025-12-04 10:20:18.722455][5244.824566302], took 4.14min 2025-12-04T10:20:18.7449977Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_dtensor/distributed.tensor.test_dtensor-780171e06b9d081c.xml 2025-12-04T10:20:18.8692903Z Running distributed/test_p2p_ipc 1/1 ... [2025-12-04 10:20:18.868762][5244.970892971] 2025-12-04T10:20:18.8694229Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:20:18.8695562Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_p2p_ipc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:20:18.869115] 2025-12-04T10:20:22.2922257Z 2025-12-04T10:20:22.2923386Z distributed/test_p2p_ipc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_p2p_ipc_1.1_b1c3fbf05590da79_.log 2025-12-04T10:20:22.2924556Z Running 1 items in this shard: test/distributed/test_p2p_ipc.py::P2PIpcTest::test_p2p_ipc 2025-12-04T10:20:22.2925028Z 2025-12-04T10:20:22.2925619Z Finished distributed/test_p2p_ipc 1/1 ... [2025-12-04 10:20:22.291667][5248.393797724], took 0.06min 2025-12-04T10:20:22.3135766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_p2p_ipc/distributed.test_p2p_ipc-22d7fd7242fa3e1d.xml 2025-12-04T10:20:22.3721179Z Running distributed/tensor/test_common_rules 1/1 ... [2025-12-04 10:20:22.371506][5248.473637739] 2025-12-04T10:20:22.3721836Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:20:22.3723297Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_common_rules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:20:22.371856] 2025-12-04T10:20:30.9081807Z 2025-12-04T10:20:30.9082990Z distributed/tensor/test_common_rules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_common_rules_1.1_14dc10d88c04ca47_.log 2025-12-04T10:20:30.9088777Z Running 10 items in this shard: test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_einop_basic_propagation, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_einop_errors, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_einop_linearity, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_einop_merge_sharding, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_einop_multi_sharding_on_mesh_dim, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_einop_pointwise_propagation, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_pointwise_enforce_sharding_multi_sharding_on_mesh_dim, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_pointwise_multi_sharding_on_mesh_dim, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_pointwise_rules_broadcasting, test/distributed/tensor/test_common_rules.py::CommonRulesTest::test_pointwise_rules_suggestion 2025-12-04T10:20:30.9093821Z 2025-12-04T10:20:30.9094237Z Finished distributed/tensor/test_common_rules 1/1 ... [2025-12-04 10:20:30.907761][5257.009874813], took 0.14min 2025-12-04T10:20:30.9324151Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_common_rules/distributed.tensor.test_common_rules-f2e475ef5a58885a.xml 2025-12-04T10:20:31.0203845Z Running distributed/checkpoint/test_hf_safetensor_e2e 1/1 ... [2025-12-04 10:20:31.019763][5257.121895042] 2025-12-04T10:20:31.0204564Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:20:31.0206035Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_hf_safetensor_e2e.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:20:31.020112] 2025-12-04T10:21:10.8854771Z 2025-12-04T10:21:10.8857711Z distributed/checkpoint/test_hf_safetensor_e2e 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_hf_safetensor_e2e_1.1_5f8a368983958374_.log 2025-12-04T10:21:10.8865814Z Running 11 items in this shard: test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestSingleRankSaveLoad::test_load, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestSingleRankSaveLoad::test_load_into_empty_dict, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestSingleRankSaveLoad::test_load_with_multiple_threads, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestSingleRankSaveLoad::test_quantized_checkpoint_loading, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestSingleRankSaveLoad::test_save, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestDistributedHFSafetensorsConsolidation::test_consolidate_to_one_file, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestDTensorReshardPlacementChange::test_1d_to_1d_reshard_placement_change, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestDTensorReshardPlacementChange::test_2d_to_2d_reshard_placement_change, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestDTensorReshardMeshChange::test_1d_to_2d_reshard_mesh_change, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestDTensorReshardMeshChange::test_2d_to_1d_reshard_mesh_change, test/distributed/checkpoint/test_hf_safetensor_e2e.py::TestDTensorReshardMeshChange::test_dtensor_checkpoint_resharding_with_empty_shard 2025-12-04T10:21:10.8872729Z 2025-12-04T10:21:10.8873169Z Finished distributed/checkpoint/test_hf_safetensor_e2e 1/1 ... [2025-12-04 10:21:10.885043][5296.987174066], took 0.66min 2025-12-04T10:21:10.9076503Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_hf_safetensor_e2e/distributed.checkpoint.test_hf_safetensor_e2e-49bc702c32e1be14.xml 2025-12-04T10:21:11.0363211Z Running distributed/tensor/test_dynamic 1/1 ... [2025-12-04 10:21:11.035820][5297.137952516] 2025-12-04T10:21:11.0363841Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:11.0365078Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dynamic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:11.036164] 2025-12-04T10:22:03.7829794Z 2025-12-04T10:22:03.7833070Z distributed/tensor/test_dynamic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dynamic_1.1_6f95f3474f81ab92_.log 2025-12-04T10:22:03.7836218Z Running 4 items in this shard: test/distributed/tensor/test_dynamic.py::TestDynamic::test_embedding_fake_tensor_cache_enabled_False, test/distributed/tensor/test_dynamic.py::TestDynamic::test_embedding_fake_tensor_cache_enabled_True, test/distributed/tensor/test_dynamic.py::TestDynamicWithLocalTensor::test_embedding_fake_tensor_cache_enabled_False, test/distributed/tensor/test_dynamic.py::TestDynamicWithLocalTensor::test_embedding_fake_tensor_cache_enabled_True 2025-12-04T10:22:03.7838829Z 2025-12-04T10:22:03.7839213Z Finished distributed/tensor/test_dynamic 1/1 ... [2025-12-04 10:22:03.782469][5349.884595209], took 0.88min 2025-12-04T10:22:03.8050528Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_dynamic/distributed.tensor.test_dynamic-58a11920d980fced.xml 2025-12-04T10:22:03.8924772Z Running distributed/checkpoint/e2e/test_fsdp_ep 1/1 ... [2025-12-04 10:22:03.891839][5349.993969968] 2025-12-04T10:22:03.8925437Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:22:03.8926925Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/e2e/test_fsdp_ep.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:22:03.892189] 2025-12-04T10:22:12.6791895Z 2025-12-04T10:22:12.6793136Z distributed/checkpoint/e2e/test_fsdp_ep 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.e2e.test_fsdp_ep_1.1_f9c977572fffaad5_.log 2025-12-04T10:22:12.6794594Z Running 1 items in this shard: test/distributed/checkpoint/e2e/test_fsdp_ep.py::TestFSDPWithEP::test_e2e 2025-12-04T10:22:12.6795156Z 2025-12-04T10:22:12.6795577Z Finished distributed/checkpoint/e2e/test_fsdp_ep 1/1 ... [2025-12-04 10:22:12.678698][5358.780828236], took 0.15min 2025-12-04T10:22:12.7014225Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.e2e.test_fsdp_ep/distributed.checkpoint.e2e.test_fsdp_ep-90e84c8c71d0d519.xml 2025-12-04T10:22:12.7850245Z Running distributed/pipelining/test_unflatten 1/1 ... [2025-12-04 10:22:12.784442][5358.886573111] 2025-12-04T10:22:12.7850925Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:22:12.7852375Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_unflatten.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:22:12.784788] 2025-12-04T10:22:18.7144287Z 2025-12-04T10:22:18.7145570Z distributed/pipelining/test_unflatten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_unflatten_1.1_d494e6526239495e_.log 2025-12-04T10:22:18.7147444Z Running 1 items in this shard: test/distributed/pipelining/test_unflatten.py::UnflattenTestsCUDA::test_unflatten_cuda 2025-12-04T10:22:18.7148069Z 2025-12-04T10:22:18.7148495Z Finished distributed/pipelining/test_unflatten 1/1 ... [2025-12-04 10:22:18.713886][5364.816016946], took 0.10min 2025-12-04T10:22:18.7362572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_unflatten/distributed.pipelining.test_unflatten-9c61fbce9d8da54e.xml 2025-12-04T10:22:18.7715289Z Running distributed/tensor/test_dtensor_testbase 1/1 ... [2025-12-04 10:22:18.770916][5364.873047299] 2025-12-04T10:22:18.7715955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:22:18.7717249Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dtensor_testbase.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:22:18.771265] 2025-12-04T10:22:31.3162238Z 2025-12-04T10:22:31.3163552Z distributed/tensor/test_dtensor_testbase 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_testbase_1.1_125742ee6f314706_.log 2025-12-04T10:22:31.3165172Z Running 1 items in this shard: test/distributed/tensor/test_dtensor_testbase.py::DTensorTestBaseUtilCPUTest::test_dtensor_testbase_destroy_pg 2025-12-04T10:22:31.3166232Z 2025-12-04T10:22:31.3166680Z Finished distributed/tensor/test_dtensor_testbase 1/1 ... [2025-12-04 10:22:31.315749][5377.41786345], took 0.21min 2025-12-04T10:22:31.3380103Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_dtensor_testbase/distributed.tensor.test_dtensor_testbase-90ed30d8fe3a2fcc.xml 2025-12-04T10:22:31.4313346Z Running distributed/tensor/test_redistribute 1/2 ... [2025-12-04 10:22:31.430744][5377.532875169] 2025-12-04T10:22:31.4314057Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:22:31.4315515Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_redistribute.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:22:31.431090] 2025-12-04T10:24:03.7271997Z 2025-12-04T10:24:03.7273153Z distributed/tensor/test_redistribute 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_redistribute_1.2_4b7d9ba5bb6931ec_.log 2025-12-04T10:24:03.7289808Z Running 25 items in this shard: test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_shard_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_redistribute_negative_shard_dim, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_redistribute_shard_dim_change_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_redistribute_shard_dim_change_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_redistribute_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_redistribute_uneven_sharding, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_replicate_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTest::test_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_redistribute, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_redistribute_for_special_placement, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_shard_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_shard_dim_change_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_shard_dim_change_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_local_partial_grad_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_local_partial_grad_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_shard_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_dim_alltoall_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTestWithLocalTensor::test_redistribute_shard_dim_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_distribute_all_combination, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_shard_order_same_data_as_strided_shard 2025-12-04T10:24:03.7305581Z 2025-12-04T10:24:03.7306002Z Finished distributed/tensor/test_redistribute 1/2 ... [2025-12-04 10:24:03.726896][5469.829026002], took 1.54min 2025-12-04T10:24:03.7497044Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-1fba6503450910ca.xml 2025-12-04T10:24:03.8329125Z Running distributed/test_nvshmem 1/1 ... [2025-12-04 10:24:03.832330][5469.934461477] 2025-12-04T10:24:03.8329702Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:24:03.8331094Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_nvshmem.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:24:03.832670] 2025-12-04T10:24:07.3562467Z 2025-12-04T10:24:07.3563478Z distributed/test_nvshmem 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_nvshmem_1.1_42a48f707fc7fbe7_.log 2025-12-04T10:24:07.3587484Z Running 47 items in this shard: test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_alloc, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_alloc_without_device_context, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_get_remote_tensor, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_get_remote_tensors, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_handle_offset, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_mempool_compute_ops, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_mempool_tensor_factory, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_mempool_tensor_w_collective, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_nvshmem_get, test/distributed/test_nvshmem.py::NVSHMEMSymmetricMemoryTest::test_nvshmem_put, test/distributed/test_nvshmem.py::NVSHMEMAll2AllTest::test_all_to_all_vdev, test/distributed/test_nvshmem.py::NVSHMEMAll2AllTest::test_all_to_all_vdev_2d_align_1, test/distributed/test_nvshmem.py::NVSHMEMAll2AllTest::test_all_to_all_vdev_2d_align_16, test/distributed/test_nvshmem.py::NVSHMEMAll2AllTest::test_all_to_all_vdev_2d_align_8, test/distributed/test_nvshmem.py::NVSHMEMAll2AllTest::test_all_to_all_vdev_2d_offset, test/distributed/test_nvshmem.py::NVSHMEMAll2AllTest::test_nvshmem_all_to_all, test/distributed/test_nvshmem.py::DispatchCombineTest::test_dispatch_combine_align_1, test/distributed/test_nvshmem.py::DispatchCombineTest::test_dispatch_combine_align_16, test/distributed/test_nvshmem.py::DispatchCombineTest::test_dispatch_combine_align_8, test/distributed/test_nvshmem.py::DispatchCombineInSubgroups::test_dispatch_combine_subgroup, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_128_root_ratio_1_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_128_root_ratio_1_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_128_root_ratio_1_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_128_root_ratio_2_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_128_root_ratio_2_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_128_root_ratio_2_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_32_root_ratio_1_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_32_root_ratio_1_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_32_root_ratio_1_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_32_root_ratio_2_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_32_root_ratio_2_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_32_root_ratio_2_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_512_root_ratio_1_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_512_root_ratio_1_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_512_root_ratio_1_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_512_root_ratio_2_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_512_root_ratio_2_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_multi_root_tile_reduce_tile_size_512_root_ratio_2_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_128_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_128_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_128_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_32_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_32_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_32_float32, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_512_bfloat16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_512_float16, test/distributed/test_nvshmem.py::NVSHMEMTileCommTest::test_tile_reduce_tile_size_512_float32 2025-12-04T10:24:07.3609847Z 2025-12-04T10:24:07.3610164Z Finished distributed/test_nvshmem 1/1 ... [2025-12-04 10:24:07.355798][5473.457927962], took 0.06min 2025-12-04T10:24:07.3786317Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_nvshmem/distributed.test_nvshmem-c601d1a92c913214.xml 2025-12-04T10:24:07.4105598Z Running distributed/tensor/test_attention 1/1 ... [2025-12-04 10:24:07.409928][5473.512059008] 2025-12-04T10:24:07.4106224Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:24:07.4107479Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_attention.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:24:07.410304] 2025-12-04T10:24:55.4910946Z 2025-12-04T10:24:55.4914690Z distributed/tensor/test_attention 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_attention_1.1_cb19c8955a160060_.log 2025-12-04T10:24:55.4922666Z Running 14 items in this shard: test/distributed/tensor/test_attention.py::RingAttentionTest::test_is_causal_behavior, test/distributed/tensor/test_attention.py::RingAttentionTest::test_ring_attention_sdpa, test/distributed/tensor/test_attention.py::CPFlexAttentionTest::test_cp_flex_attention_causal_mask, test/distributed/tensor/test_attention.py::CPFlexAttentionTest::test_cp_flex_attention_document_mask, test/distributed/tensor/test_attention.py::TestCPCustomOps::test_flex_cp_custom_op, test/distributed/tensor/test_attention.py::TestSharding::test_attention_shard_without_cp, test/distributed/tensor/test_attention.py::TestSharding::test_context_parallel_shard, test/distributed/tensor/test_attention.py::RingAttentionTestWithLocalTensor::test_is_causal_behavior, test/distributed/tensor/test_attention.py::RingAttentionTestWithLocalTensor::test_ring_attention_sdpa, test/distributed/tensor/test_attention.py::CPFlexAttentionTestWithLocalTensor::test_cp_flex_attention_causal_mask, test/distributed/tensor/test_attention.py::CPFlexAttentionTestWithLocalTensor::test_cp_flex_attention_document_mask, test/distributed/tensor/test_attention.py::TestCPCustomOpsWithLocalTensor::test_flex_cp_custom_op, test/distributed/tensor/test_attention.py::TestShardingWithLocalTensor::test_attention_shard_without_cp, test/distributed/tensor/test_attention.py::TestShardingWithLocalTensor::test_context_parallel_shard 2025-12-04T10:24:55.4930403Z 2025-12-04T10:24:55.4930797Z Finished distributed/tensor/test_attention 1/1 ... [2025-12-04 10:24:55.490539][5521.592668469], took 0.80min 2025-12-04T10:24:55.5136825Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_attention/distributed.tensor.test_attention-f7e42a024369f922.xml 2025-12-04T10:24:55.6098290Z Running distributed/tensor/test_convolution_ops 1/1 ... [2025-12-04 10:24:55.609600][5521.71173053] 2025-12-04T10:24:55.6098956Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:24:55.6101699Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_convolution_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:24:55.609940] 2025-12-04T10:26:07.2033357Z 2025-12-04T10:26:07.2034550Z distributed/tensor/test_convolution_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_convolution_ops_1.1_49a43fa49632a258_.log 2025-12-04T10:26:07.2044586Z Running 16 items in this shard: test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv1d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv2d_module_no_bias, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv2d_no_bias_backward, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv2d_no_bias_compile, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv3d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv_backward_none_grad_inp, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_depthwise_convolution, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_downsampling_convolution, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv1d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv2d_module_no_bias, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv2d_no_bias_backward, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv2d_no_bias_compile, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv3d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv_backward_none_grad_inp, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_depthwise_convolution, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_downsampling_convolution 2025-12-04T10:26:07.2054073Z 2025-12-04T10:26:07.2054517Z Finished distributed/tensor/test_convolution_ops 1/1 ... [2025-12-04 10:26:07.202811][5593.304941891], took 1.19min 2025-12-04T10:26:07.2258992Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_convolution_ops/distributed.tensor.test_convolution_ops-37d4b4387fe7c9dd.xml 2025-12-04T10:26:07.3217675Z Running distributed/checkpoint/fsdp/test_fsdp_dsd 1/1 ... [2025-12-04 10:26:07.321528][5593.423659392] 2025-12-04T10:26:07.3218373Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:26:07.3220908Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/fsdp/test_fsdp_dsd.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:26:07.321881] 2025-12-04T10:26:52.8015205Z 2025-12-04T10:26:52.8016475Z distributed/checkpoint/fsdp/test_fsdp_dsd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.fsdp.test_fsdp_dsd_1.1_f8fe2a82b83f915c_.log 2025-12-04T10:26:52.8022043Z Running 6 items in this shard: test/distributed/checkpoint/fsdp/test_fsdp_dsd.py::TestFullyShardWithDistributedStateDict::test_1d_fsdp_cpu_offload_full_model_state_dict, test/distributed/checkpoint/fsdp/test_fsdp_dsd.py::TestFullyShardWithDistributedStateDict::test_1d_fsdp_get_model_state_dict, test/distributed/checkpoint/fsdp/test_fsdp_dsd.py::TestFullyShardWithDistributedStateDict::test_save_with_fsdp1_and_load_with_fsdp2, test/distributed/checkpoint/fsdp/test_fsdp_dsd.py::TestFullyShardWithDistributedStateDict::test_save_with_fsdp1_and_load_with_fsdp2_tp, test/distributed/checkpoint/fsdp/test_fsdp_dsd.py::TestFullyShardWithDistributedStateDict::test_save_with_fsdp2_tp_and_load_with_tp, test/distributed/checkpoint/fsdp/test_fsdp_dsd.py::TestFullyShardWithDistributedStateDict::test_save_with_tp_and_load_with_fsdp2_tp 2025-12-04T10:26:52.8026365Z 2025-12-04T10:26:52.8026801Z Finished distributed/checkpoint/fsdp/test_fsdp_dsd 1/1 ... [2025-12-04 10:26:52.801015][5638.903145701], took 0.76min 2025-12-04T10:26:52.8246005Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.fsdp.test_fsdp_dsd/distributed.checkpoint.fsdp.test_fsdp_dsd-1bb4a1e7d3cbef72.xml 2025-12-04T10:26:52.9050856Z Running distributed/checkpoint/test_save_load_api 1/1 ... [2025-12-04 10:26:52.904396][5639.006527824] 2025-12-04T10:26:52.9051538Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:26:52.9052841Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_save_load_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:26:52.904741] 2025-12-04T10:27:06.9553714Z 2025-12-04T10:27:06.9555262Z distributed/checkpoint/test_save_load_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_save_load_api_1.1_c6bb8dfab455cfa5_.log 2025-12-04T10:27:06.9558372Z Running 2 items in this shard: test/distributed/checkpoint/test_save_load_api.py::TestSaveAndLoadAPI::test_assert_same_keys, test/distributed/checkpoint/test_save_load_api.py::TestSaveAndLoadAPI::test_auto_detect 2025-12-04T10:27:06.9559978Z 2025-12-04T10:27:06.9560428Z Finished distributed/checkpoint/test_save_load_api 1/1 ... [2025-12-04 10:27:06.954828][5653.056958693], took 0.23min 2025-12-04T10:27:06.9788180Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_save_load_api/distributed.checkpoint.test_save_load_api-607495a1278fa4ba.xml 2025-12-04T10:27:07.0515178Z Running distributed/tensor/debug/test_comm_mode_features 1/1 ... [2025-12-04 10:27:07.050939][5653.15307035] 2025-12-04T10:27:07.0516019Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:27:07.0517500Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/debug/test_comm_mode_features.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:07.051291] 2025-12-04T10:27:34.7853273Z 2025-12-04T10:27:34.7854711Z distributed/tensor/debug/test_comm_mode_features 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.debug.test_comm_mode_features_1.1_03fdf3eadd2ab611_.log 2025-12-04T10:27:34.7858306Z Running 4 items in this shard: test/distributed/tensor/debug/test_comm_mode_features.py::TestCommModeFeatures::test_MLPStacked_distributed_sharding_display, test/distributed/tensor/debug/test_comm_mode_features.py::TestCommModeFeatures::test_MLP_distributed_sharding_display, test/distributed/tensor/debug/test_comm_mode_features.py::TestCommModeFeatures::test_MLP_module_tracing, test/distributed/tensor/debug/test_comm_mode_features.py::TestCommModeFeatures::test_transformer_module_tracing 2025-12-04T10:27:34.7862707Z 2025-12-04T10:27:34.7863201Z Finished distributed/tensor/debug/test_comm_mode_features 1/1 ... [2025-12-04 10:27:34.784875][5680.887006427], took 0.46min 2025-12-04T10:27:34.8087078Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode_features/distributed.tensor.debug.test_comm_mode_features-f7a4a3df89327d4b.xml 2025-12-04T10:27:34.9081709Z Running distributed/tensor/test_dtensor_ops 1/1 ... [2025-12-04 10:27:34.907730][5681.009861173] 2025-12-04T10:27:34.9082373Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:27:34.9083859Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dtensor_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:34.908101] 2025-12-04T10:27:40.6683495Z 2025-12-04T10:27:40.6684678Z distributed/tensor/test_dtensor_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_ops_1.1_67309fc460535665_.log 2025-12-04T10:27:40.6686290Z Running 0 items in this shard: 2025-12-04T10:27:40.6686556Z 2025-12-04T10:27:40.6686996Z Finished distributed/tensor/test_dtensor_ops 1/1 ... [2025-12-04 10:27:40.668178][5686.770309519], took 0.10min 2025-12-04T10:27:40.6923239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_dtensor_ops/distributed.tensor.test_dtensor_ops-ea81859469c32dce.xml 2025-12-04T10:27:40.7178141Z Running distributed/test_debug 1/1 ... [2025-12-04 10:27:40.717595][5686.819727282] 2025-12-04T10:27:40.7178937Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:27:40.7181703Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_debug.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:40.717951] 2025-12-04T10:27:45.2936884Z 2025-12-04T10:27:45.2937979Z distributed/test_debug 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_debug_1.1_a56c6136c9a8abc4_.log 2025-12-04T10:27:45.2939142Z Running 1 items in this shard: test/distributed/test_debug.py::TestDebug::test_all 2025-12-04T10:27:45.2939690Z 2025-12-04T10:27:45.2940013Z Finished distributed/test_debug 1/1 ... [2025-12-04 10:27:45.293516][5691.395646158], took 0.08min 2025-12-04T10:27:45.3170621Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_debug/distributed.test_debug-be889cccd8acb9a9.xml 2025-12-04T10:27:45.3513889Z Running distributed/test_overlap_bucketing_unit 1/1 ... [2025-12-04 10:27:45.350782][5691.452913481] 2025-12-04T10:27:45.3514527Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:27:45.3515811Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_overlap_bucketing_unit.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:45.351127] 2025-12-04T10:27:56.0912592Z 2025-12-04T10:27:56.0913790Z distributed/test_overlap_bucketing_unit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_overlap_bucketing_unit_1.1_9dbb1a52f33d29c7_.log 2025-12-04T10:27:56.0920514Z Running 9 items in this shard: test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_all_reduce, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_independent_collectives, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_multidtype_collectives, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_with_convert_dtype_as_hiding_nodes, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_with_multiple_hiding_nodes, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_cant_bucket_ag_with_rs_hiding_interval_between_final_mm_hidden_False, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_cant_bucket_ag_with_rs_hiding_interval_between_final_mm_hidden_True, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_cant_bucket_nested_hiding_intervals, test/distributed/test_overlap_bucketing_unit.py::TestCrossPGOverlap::test_cross_pg_prefetch_during_exposed_wait 2025-12-04T10:27:56.0926946Z 2025-12-04T10:27:56.0927419Z Finished distributed/test_overlap_bucketing_unit 1/1 ... [2025-12-04 10:27:56.090861][5702.192975855], took 0.18min 2025-12-04T10:27:56.1145270Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_overlap_bucketing_unit/distributed.test_overlap_bucketing_unit-ca2c159a43fd5a2e.xml 2025-12-04T10:27:56.2142177Z Running distributed/checkpoint/_experimental/test_checkpoint_writer 1/1 ... [2025-12-04 10:27:56.213974][5702.31610546] 2025-12-04T10:27:56.2142962Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:27:56.2145237Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_checkpoint_writer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:56.214310] 2025-12-04T10:28:00.0889537Z 2025-12-04T10:28:00.0891104Z distributed/checkpoint/_experimental/test_checkpoint_writer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_checkpoint_writer_1.1_1dd840a86f907337_.log 2025-12-04T10:28:00.0897910Z Running 8 items in this shard: test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriterConfig::test_custom_values, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriterConfig::test_default_values, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_close, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_calls_barrier, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_calls_commit_hooks, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_creates_checkpoint_file, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_without_barrier, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_without_commit_hook 2025-12-04T10:28:00.0903046Z 2025-12-04T10:28:00.0903573Z Finished distributed/checkpoint/_experimental/test_checkpoint_writer 1/1 ... [2025-12-04 10:28:00.088574][5706.190700754], took 0.06min 2025-12-04T10:28:00.1127980Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_writer/distributed.checkpoint._experimental.test_checkpoint_writer-b51ac79d06c0ddb7.xml 2025-12-04T10:28:00.1450035Z Running distributed/checkpoint/_experimental/test_checkpointer 1/1 ... [2025-12-04 10:28:00.144430][5706.246561397] 2025-12-04T10:28:00.1450810Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:28:00.1452200Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_checkpointer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:28:00.144771] 2025-12-04T10:28:37.5496312Z 2025-12-04T10:28:37.5497744Z distributed/checkpoint/_experimental/test_checkpointer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_checkpointer_1.1_7614bf1ed13f1d86_.log 2025-12-04T10:28:37.5505924Z Running 11 items in this shard: test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_load_strict_mode, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_load_with_map_location, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_nested_dict_partial_load, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_partial_load, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_save_and_load_basic, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_save_with_kwargs, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_error_handling, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_future_results, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_multiple_saves_ordering, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_returns_futures, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_sequential_saves_wait 2025-12-04T10:28:37.5512705Z 2025-12-04T10:28:37.5513202Z Finished distributed/checkpoint/_experimental/test_checkpointer 1/1 ... [2025-12-04 10:28:37.549201][5743.651315545], took 0.62min 2025-12-04T10:28:37.5737048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpointer/distributed.checkpoint._experimental.test_checkpointer-181aea9ab4e75ef7.xml 2025-12-04T10:28:37.6512919Z Running distributed/tensor/test_init 1/1 ... [2025-12-04 10:28:37.650712][5743.752844417] 2025-12-04T10:28:37.6513530Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:28:37.6514939Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:28:37.651055] 2025-12-04T10:29:23.9325564Z 2025-12-04T10:29:23.9326690Z distributed/tensor/test_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_init_1.1_e83e12e555837820_.log 2025-12-04T10:29:23.9333962Z Running 13 items in this shard: test/distributed/tensor/test_init.py::DTensorInitOpsTest::test_init_ops, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_empty, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_full, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_ones, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_zeros, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_zeros_full_mesh, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_zeros_submesh, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_empty, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_full, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_ones, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_zeros, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_zeros_full_mesh, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_zeros_submesh 2025-12-04T10:29:23.9340208Z 2025-12-04T10:29:23.9340569Z Finished distributed/tensor/test_init 1/1 ... [2025-12-04 10:29:23.932075][5790.03420565], took 0.77min 2025-12-04T10:29:23.9569266Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_init/distributed.tensor.test_init-b970b50400f392fc.xml 2025-12-04T10:29:24.0378147Z Running distributed/_composable/test_checkpoint 1/1 ... [2025-12-04 10:29:24.037564][5790.13969504] 2025-12-04T10:29:24.0379061Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:29:24.0381246Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:24.037922] 2025-12-04T10:29:28.3631906Z 2025-12-04T10:29:28.3633402Z distributed/_composable/test_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_checkpoint_1.1_abf6f06f0264530a_.log 2025-12-04T10:29:28.3637112Z Running 6 items in this shard: test/distributed/_composable/test_checkpoint.py::TestCheckpoint::test_checkpoint_kwargs, test/distributed/_composable/test_checkpoint.py::TestCheckpoint::test_clears_state_on_error_in_forward, test/distributed/_composable/test_checkpoint.py::TestCheckpoint::test_multi_args, test/distributed/_composable/test_checkpoint.py::TestCheckpoint::test_random_cpu, test/distributed/_composable/test_checkpoint.py::TestCheckpoint::test_tensor_only_cpu, test/distributed/_composable/test_checkpoint.py::TestCheckpoint::test_tensor_only_gpu 2025-12-04T10:29:28.3639993Z 2025-12-04T10:29:28.3640384Z Finished distributed/_composable/test_checkpoint 1/1 ... [2025-12-04 10:29:28.362624][5794.464749883], took 0.07min 2025-12-04T10:29:28.3872925Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_checkpoint/distributed._composable.test_checkpoint-a1aa396939174424.xml 2025-12-04T10:29:28.4195724Z Running distributed/_tools/test_fsdp2_mem_tracker 1/1 ... [2025-12-04 10:29:28.418864][5794.520995636] 2025-12-04T10:29:28.4196371Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:29:28.4197819Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_fsdp2_mem_tracker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:28.419212] 2025-12-04T10:29:59.2632338Z 2025-12-04T10:29:59.2635908Z distributed/_tools/test_fsdp2_mem_tracker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_fsdp2_mem_tracker_1.1_26e469db93fd8e16_.log 2025-12-04T10:29:59.2639075Z Running 3 items in this shard: test/distributed/_tools/test_fsdp2_mem_tracker.py::TestTrackerFullyShard1DTrainingCore::test_tracker_multi_group_eager, test/distributed/_tools/test_fsdp2_mem_tracker.py::TestTrackerFullyShard1DTrainingCore::test_tracker_non_root_forward_backward, test/distributed/_tools/test_fsdp2_mem_tracker.py::TestTrackerFullyShard1DTrainingCompose::test_tracker_with_activation_checkpointing 2025-12-04T10:29:59.2641176Z 2025-12-04T10:29:59.2641595Z Finished distributed/_tools/test_fsdp2_mem_tracker 1/1 ... [2025-12-04 10:29:59.262761][5825.364891194], took 0.51min 2025-12-04T10:29:59.2904926Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._tools.test_fsdp2_mem_tracker/distributed._tools.test_fsdp2_mem_tracker-3cf763bb11a5de99.xml 2025-12-04T10:29:59.3806242Z Running distributed/_composable/test_replicate_mixed_precision 1/1 ... [2025-12-04 10:29:59.380074][5825.482205133] 2025-12-04T10:29:59.3806982Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:29:59.3808338Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_replicate_mixed_precision.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:59.380456] 2025-12-04T10:30:21.4990143Z 2025-12-04T10:30:21.4991506Z distributed/_composable/test_replicate_mixed_precision 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_replicate_mixed_precision_1.1_01fba1f975bb97b6_.log 2025-12-04T10:30:21.4998670Z Running 9 items in this shard: test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionTraining::test_compute_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionTraining::test_grad_acc_with_reduce_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionTraining::test_reduce_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_clamp_reduce_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_dataclass_input, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_float16_on_one_submodule, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_norm_modules_bf16, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_norm_modules_fp16, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_submodules_with_external_inputs 2025-12-04T10:30:21.5004774Z 2025-12-04T10:30:21.5005256Z Finished distributed/_composable/test_replicate_mixed_precision 1/1 ... [2025-12-04 10:30:21.498672][5847.600801589], took 0.37min 2025-12-04T10:30:21.5267871Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_replicate_mixed_precision/distributed._composable.test_replicate_mixed_precision-36b9cc9e417e77fd.xml 2025-12-04T10:30:21.6068233Z Running distributed/checkpoint/e2e/test_fine_tuning 1/1 ... [2025-12-04 10:30:21.606289][5847.708419864] 2025-12-04T10:30:21.6068977Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:30:21.6070468Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/e2e/test_fine_tuning.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:21.606663] 2025-12-04T10:30:32.0980820Z 2025-12-04T10:30:32.0983036Z distributed/checkpoint/e2e/test_fine_tuning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.e2e.test_fine_tuning_1.1_0eddc041b8455dc1_.log 2025-12-04T10:30:32.0984926Z Running 1 items in this shard: test/distributed/checkpoint/e2e/test_fine_tuning.py::TestFineTuning::test_fine_tuning 2025-12-04T10:30:32.0985559Z 2025-12-04T10:30:32.0986023Z Finished distributed/checkpoint/e2e/test_fine_tuning 1/1 ... [2025-12-04 10:30:32.098142][5858.200272013], took 0.17min 2025-12-04T10:30:32.1233439Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.e2e.test_fine_tuning/distributed.checkpoint.e2e.test_fine_tuning-c68ce24632e972fe.xml 2025-12-04T10:30:32.2236159Z Running distributed/tensor/test_matrix_ops 1/1 ... [2025-12-04 10:30:32.222908][5858.325040255] 2025-12-04T10:30:32.2236926Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:30:32.2238727Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_matrix_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:32.223256] 2025-12-04T10:32:04.5721061Z 2025-12-04T10:32:04.5722238Z distributed/tensor/test_matrix_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_matrix_ops_1.1_08575986e567d5c0_.log 2025-12-04T10:32:04.5737396Z Running 30 items in this shard: test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_addmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_addmm_auto_redistribute, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_addmm_empty_operand, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_baddbmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_bmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_dtensor_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_grouped_mm_kwargs0, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_grouped_mm_kwargs1, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_matmul, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_scaled_dot_product_attention, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_scaled_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_t, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_t_partial, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_tensordot_shampoo, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_addmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_addmm_auto_redistribute, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_addmm_empty_operand, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_baddbmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_bmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_dtensor_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_grouped_mm_kwargs0, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_grouped_mm_kwargs1, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_matmul, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_scaled_dot_product_attention, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_scaled_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_t, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_t_partial, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_tensordot_shampoo 2025-12-04T10:32:04.5751140Z 2025-12-04T10:32:04.5751650Z Finished distributed/tensor/test_matrix_ops 1/1 ... [2025-12-04 10:32:04.571556][5950.673686208], took 1.54min 2025-12-04T10:32:04.5967644Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_matrix_ops/distributed.tensor.test_matrix_ops-225fb5a0fab4f212.xml 2025-12-04T10:32:04.7055789Z Running distributed/tensor/test_optimizers 1/1 ... [2025-12-04 10:32:04.704988][5950.807119552] 2025-12-04T10:32:04.7056452Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:32:04.7057759Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_optimizers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:32:04.705343] 2025-12-04T10:34:43.2889933Z 2025-12-04T10:34:43.2891093Z distributed/tensor/test_optimizers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_optimizers_1.1_7e5f1be8728dbdea_.log 2025-12-04T10:34:43.2906674Z Running 24 items in this shard: test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_RMSprop_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_adadelta_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_adagrad_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_adam_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_adamax_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_adamw_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_admaw_fused_across_meshes, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_asgd_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_nadam_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_optimizer_foreach_supported_types_include_DTensor, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_radam_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizer::test_sgd_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_RMSprop_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_adadelta_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_adagrad_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_adam_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_adamax_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_adamw_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_admaw_fused_across_meshes, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_asgd_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_nadam_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_optimizer_foreach_supported_types_include_DTensor, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_radam_1d_sharding, test/distributed/tensor/test_optimizers.py::TestDTensorOptimizerWithLocalTensor::test_sgd_1d_sharding 2025-12-04T10:34:43.2919387Z 2025-12-04T10:34:43.2919850Z Finished distributed/tensor/test_optimizers 1/1 ... [2025-12-04 10:34:43.288544][6109.39067507], took 2.64min 2025-12-04T10:34:43.3135737Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_optimizers/distributed.tensor.test_optimizers-208a364b29da2421.xml 2025-12-04T10:34:43.8351576Z Uploading artifacts took 0.44 seconds 2025-12-04T10:34:43.8353315Z Running distributed/test_symmetric_memory 1/1 ... [2025-12-04 10:34:43.835207][6109.937337643] 2025-12-04T10:34:43.8354131Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:34:43.8357112Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_symmetric_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:34:43.835532] 2025-12-04T10:34:48.1611916Z 2025-12-04T10:34:48.1613127Z distributed/test_symmetric_memory 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_symmetric_memory_1.1_64f49cd6a7ec957a_.log 2025-12-04T10:34:48.1668990Z Running 96 items in this shard: test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_allow_overlapping_devices, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_cuda_nvlink_connectivity_detection, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_get_backend, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_get_signal_pad, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_has_multicast_support, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_large_alloc, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_low_contention_all_gather_symm_mem_input_False, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_low_contention_all_gather_symm_mem_input_True, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_low_contention_reduce_scatter_reduce_op_avg_symm_mem_input_False, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_low_contention_reduce_scatter_reduce_op_avg_symm_mem_input_True, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_low_contention_reduce_scatter_reduce_op_sum_symm_mem_input_False, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_low_contention_reduce_scatter_reduce_op_sum_symm_mem_input_True, test/distributed/test_symmetric_memory.py::SymmetricMemoryTest::test_subgroup, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_matmul_gather_dim_0, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_matmul_gather_dim_1, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_matmul_gather_dim_2, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_matmul_native_symm_mem_input_False_is_b_row_major_False, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_matmul_native_symm_mem_input_False_is_b_row_major_True, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_matmul_native_symm_mem_input_True_is_b_row_major_False, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_matmul_native_symm_mem_input_True_is_b_row_major_True, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_scaled_matmul_gather_dim_0_scale_mode_row-wise-replicated, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_scaled_matmul_gather_dim_0_scale_mode_row-wise-sharded, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_scaled_matmul_gather_dim_0_scale_mode_tensor-wise, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_scaled_matmul_gather_dim_1_scale_mode_row-wise-replicated, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_scaled_matmul_gather_dim_1_scale_mode_row-wise-sharded, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_all_gather_scaled_matmul_gather_dim_1_scale_mode_tensor-wise, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_matmul_reduce_scatter_scatter_dim_0, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_matmul_reduce_scatter_scatter_dim_1, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_matmul_reduce_scatter_scatter_dim_2, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_scaled_matmul_reduce_scatter_scatter_dim_0_rowwise_False, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_scaled_matmul_reduce_scatter_scatter_dim_0_rowwise_True, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_scaled_matmul_reduce_scatter_scatter_dim_1_rowwise_False, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_fused_scaled_matmul_reduce_scatter_scatter_dim_1_rowwise_True, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_multimem_all_gather_matmul, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_optimal_layout_dim_0, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_optimal_layout_dim_1, test/distributed/test_symmetric_memory.py::AsyncTPTest::test_optimal_layout_dim_2, test/distributed/test_symmetric_memory.py::SymmMemEmptySetDeviceTest::test_empty_strided_p2p_persistent_set_device_False, test/distributed/test_symmetric_memory.py::SymmMemEmptySetDeviceTest::test_empty_strided_p2p_persistent_set_device_True, test/distributed/test_symmetric_memory.py::SymmMemEmptySetDeviceTest::test_empty_strided_p2p_set_device_False, test/distributed/test_symmetric_memory.py::SymmMemEmptySetDeviceTest::test_empty_strided_p2p_set_device_True, test/distributed/test_symmetric_memory.py::SymmMemNegativeTest::test_barrier_timeout, test/distributed/test_symmetric_memory.py::SymmMemNegativeTest::test_put_signal_timeout, test/distributed/test_symmetric_memory.py::SymmMemNegativeTest::test_wait_signal_timeout, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_gather_align_bytes_16, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_gather_align_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_gather_align_bytes_8, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_16_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_16_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_16_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_4_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_4_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_4_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_8_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_8_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_bfloat16_align_bytes_8_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_16_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_16_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_16_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_4_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_4_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_4_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_8_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_8_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_all_reduce_float32_align_bytes_8_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_16_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_16_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_16_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_4_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_4_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_4_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_8_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_8_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_bfloat16_align_bytes_8_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_16_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_16_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_16_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_4_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_4_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_4_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_8_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_8_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_all_reduce_float32_align_bytes_8_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_reduce_out_bfloat16_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_reduce_out_bfloat16_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_reduce_out_bfloat16_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_reduce_out_float32_size_bytes_4, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_reduce_out_float32_size_bytes_8192, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_multimem_one_shot_reduce_out_float32_size_bytes_8196, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_one_shot_all_reduce, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_reduce_scatter, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_reduce_scatter_corner_cases, test/distributed/test_symmetric_memory.py::SymmMemCollectiveTest::test_two_shot_all_reduce, test/distributed/test_symmetric_memory.py::LoweringTest::test_lowering_one_shot_all_reduce, test/distributed/test_symmetric_memory.py::SymmMemSingleProcTest::test_memset32, test/distributed/test_symmetric_memory.py::SymmMemSingleProcTest::test_stream_write_value32 2025-12-04T10:34:48.1723285Z 2025-12-04T10:34:48.1723662Z Finished distributed/test_symmetric_memory 1/1 ... [2025-12-04 10:34:48.160850][6114.262979463], took 0.07min 2025-12-04T10:34:48.1860713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_symmetric_memory/distributed.test_symmetric_memory-e6666f579f07be4f.xml 2025-12-04T10:34:48.2190998Z Running distributed/_tools/test_runtime_estimator 1/1 ... [2025-12-04 10:34:48.218574][6114.320706083] 2025-12-04T10:34:48.2191660Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:34:48.2192948Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_runtime_estimator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:34:48.218913] 2025-12-04T10:35:03.2678033Z 2025-12-04T10:35:03.2679745Z distributed/_tools/test_runtime_estimator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_runtime_estimator_1.1_65959848b6aa0401_.log 2025-12-04T10:35:03.2681826Z Running 2 items in this shard: test/distributed/_tools/test_runtime_estimator.py::TestRuntimeEstimator::test_conv_model_runtime, test/distributed/_tools/test_runtime_estimator.py::TestRuntimeEstimator::test_transformer_runtime 2025-12-04T10:35:03.2683003Z 2025-12-04T10:35:03.2683452Z Finished distributed/_tools/test_runtime_estimator 1/1 ... [2025-12-04 10:35:03.267180][6129.369305205], took 0.25min 2025-12-04T10:35:03.2929561Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._tools.test_runtime_estimator/distributed._tools.test_runtime_estimator-9422bc676dd3a656.xml 2025-12-04T10:35:03.3698874Z Running distributed/_composable/test_replicate_with_compiler 1/1 ... [2025-12-04 10:35:03.369642][6129.471773774] 2025-12-04T10:35:03.3699696Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:35:03.3701801Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_replicate_with_compiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:03.369991] 2025-12-04T10:37:56.2554586Z 2025-12-04T10:37:56.2556234Z distributed/_composable/test_replicate_with_compiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_replicate_with_compiler_1.1_eea3e5fa3a5dfc0a_.log 2025-12-04T10:37:56.2562189Z Running 10 items in this shard: test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_bucketing_coalesced_op, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_bucketing_concat_op, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_compile_backward_only, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_compile_bf16, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_compile_cpu, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_compile_cpu_no_sync, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_compile_fp16, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_compile_gpu, test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_compile_gpu_ac, test/distributed/_composable/test_replicate_with_compiler.py::DDP_TP_Test::test_ddp_tp 2025-12-04T10:37:56.2567043Z 2025-12-04T10:37:56.2567594Z Finished distributed/_composable/test_replicate_with_compiler 1/1 ... [2025-12-04 10:37:56.254918][6302.357048004], took 2.88min 2025-12-04T10:37:56.2805470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_replicate_with_compiler/distributed._composable.test_replicate_with_compiler-845dc753fbff3b86.xml 2025-12-04T10:37:56.4149172Z Running distributed/_composable/fsdp/test_fully_shard_autograd 1/1 ... [2025-12-04 10:37:56.414346][6302.516477937] 2025-12-04T10:37:56.4150115Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:37:56.4151466Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_autograd.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:37:56.414690] 2025-12-04T10:38:27.2572869Z 2025-12-04T10:38:27.2574586Z distributed/_composable/fsdp/test_fully_shard_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_autograd_1.1_8ee07b7d86524ab5_.log 2025-12-04T10:38:27.2579273Z Running 5 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_autograd.py::TestFullyShardAutograd::test_nontensor_activations, test/distributed/_composable/fsdp/test_fully_shard_autograd.py::TestFullyShardAutograd::test_unused_forward_module, test/distributed/_composable/fsdp/test_fully_shard_autograd.py::TestFullyShardAutograd::test_unused_forward_output, test/distributed/_composable/fsdp/test_fully_shard_autograd.py::TestFullyShardPostAccGradHookMultiThread::test_post_acc_grad_hook_runs, test/distributed/_composable/fsdp/test_fully_shard_autograd.py::TestFullyShardPostAccGradHookMultiProcess::test_post_acc_grad_hook_optim_parity 2025-12-04T10:38:27.2583074Z 2025-12-04T10:38:27.2583597Z Finished distributed/_composable/fsdp/test_fully_shard_autograd 1/1 ... [2025-12-04 10:38:27.256699][6333.3588307], took 0.51min 2025-12-04T10:38:27.2821042Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_autograd/distributed._composable.fsdp.test_fully_shard_autograd-6a8bc02b72927b79.xml 2025-12-04T10:38:27.3851280Z Running distributed/_composable/test_composability/test_2d_composability 1/1 ... [2025-12-04 10:38:27.384467][6333.486599207] 2025-12-04T10:38:27.3852081Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:38:27.3853960Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_composability/test_2d_composability.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:27.384806] 2025-12-04T10:40:32.9203831Z 2025-12-04T10:40:32.9207541Z distributed/_composable/test_composability/test_2d_composability 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_composability.test_2d_composability_1.1_35a52faaad7dc617_.log 2025-12-04T10:40:32.9222232Z Running 18 items in this shard: test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_tp_with_fsdp_offloading, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_train_parity_2d_mlp, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_train_parity_2d_transformer, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_train_parity_2d_transformer_checkpoint_resume, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DStateDict::test_fully_shard_tp_2d_set_full_state_dict, test/distributed/_composable/test_composability/test_2d_composability.py::Test2dFSDP1ParallelIntegration::test_2d_ddp_integration_functionality, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_e2e_training_default, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_e2e_training_not_use_orig_params, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_e2e_training_use_orig_params, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_fsdp_state_enable_extension, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_load_state_dict_is_even_sharded_model_False, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_load_state_dict_is_even_sharded_model_True, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_optim_state_dict_is_even_sharded_model_False, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_optim_state_dict_is_even_sharded_model_True, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_state_dict_is_even_sharded_model_False, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_state_dict_is_even_sharded_model_True, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_fsdp1_tp_2d_set_full_state_dict, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_fsdp_2d_extension 2025-12-04T10:40:32.9234913Z 2025-12-04T10:40:32.9235461Z Finished distributed/_composable/test_composability/test_2d_composability 1/1 ... [2025-12-04 10:40:32.920013][6459.02214267], took 2.09min 2025-12-04T10:40:32.9457065Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_composability.test_2d_composability/distributed._composable.test_composability.test_2d_composability-218cfa8a31a3ba84.xml 2025-12-04T10:40:33.0411124Z Running distributed/fsdp/test_fsdp_optim_state 1/1 ... [2025-12-04 10:40:33.040525][6459.142657682] 2025-12-04T10:40:33.0411772Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:40:33.0413047Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_optim_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:33.040873] 2025-12-04T10:47:04.1481623Z 2025-12-04T10:47:04.1483093Z distributed/fsdp/test_fsdp_optim_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_optim_state_1.1_7d67a9c100256545_.log 2025-12-04T10:47:04.1534580Z Running 60 items in this shard: test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_compatible_with_trec, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_flatten_sharded_optim_state_dict_nested, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_flatten_sharded_optim_state_dict_transformer, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_full_optim_state_dict_keys, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_full_optim_state_dict_nested_invalid, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_interface_arguments, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_no_grad, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_input_warning, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_without_param_groups, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type0_use_multiple_param_groups_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type0_use_multiple_param_groups_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type1_use_multiple_param_groups_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type1_use_multiple_param_groups_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_names, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_save_load_without_0th_param_state_state_dict_type0, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_save_load_without_0th_param_state_state_dict_type1, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_halve_world_size, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_transformer, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_halve_world_size, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_transformer, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type0_add_to_fsdp_module_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type0_add_to_fsdp_module_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type1_add_to_fsdp_module_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type1_add_to_fsdp_module_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_state_dict_with_none_tensor_state, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_use_orig_params, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_with_empty_optimizer_state, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_with_no_shard 2025-12-04T10:47:04.1577306Z 2025-12-04T10:47:04.1577729Z Finished distributed/fsdp/test_fsdp_optim_state 1/1 ... [2025-12-04 10:47:04.151584][6850.253711248], took 6.52min 2025-12-04T10:47:04.1775751Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_optim_state/distributed.fsdp.test_fsdp_optim_state-a3d7bfb88e0bb04b.xml 2025-12-04T10:47:04.2458247Z Running distributed/test_c10d_logger 1/1 ... [2025-12-04 10:47:04.245591][6850.347723113] 2025-12-04T10:47:04.2458849Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:47:04.2461435Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_logger.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:47:04.245953] 2025-12-04T10:47:18.0469515Z 2025-12-04T10:47:18.0470610Z distributed/test_c10d_logger 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_logger_1.1_0a52c9bc30c9920f_.log 2025-12-04T10:47:18.0472341Z Running 2 items in this shard: test/distributed/test_c10d_logger.py::C10dErrorLoggerTest::test_exception_logger, test/distributed/test_c10d_logger.py::C10dErrorLoggerTest::test_get_or_create_logger 2025-12-04T10:47:18.0473319Z 2025-12-04T10:47:18.0473691Z Finished distributed/test_c10d_logger 1/1 ... [2025-12-04 10:47:18.046453][6864.148583076], took 0.23min 2025-12-04T10:47:18.0726619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_logger/distributed.test_c10d_logger-087942ef032695a4.xml 2025-12-04T10:47:18.1534590Z Running distributed/_composable/test_replicate_training 1/1 ... [2025-12-04 10:47:18.152871][6864.255001338] 2025-12-04T10:47:18.1535564Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:47:18.1536939Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_replicate_training.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:47:18.153239] 2025-12-04T10:48:58.0697632Z 2025-12-04T10:48:58.0698931Z distributed/_composable/test_replicate_training 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_replicate_training_1.1_cc04659fdefd418f_.log 2025-12-04T10:48:58.0710922Z Running 17 items in this shard: test/distributed/_composable/test_replicate_training.py::TestReplicateForwardInputs::test_root_move_forward_input_to_device, test/distributed/_composable/test_replicate_training.py::TestReplicateRegisteredParams::test_param_registration_after_backward, test/distributed/_composable/test_replicate_training.py::TestReplicateRegisteredParams::test_param_registration_after_forward, test/distributed/_composable/test_replicate_training.py::TestReplicateCastAfterInit::test_to_float64_after_init, test/distributed/_composable/test_replicate_training.py::TestReplicate1DTrainingCore::test_explicit_prefetching, test/distributed/_composable/test_replicate_training.py::TestReplicate1DTrainingCore::test_multi_forward_module, test/distributed/_composable/test_replicate_training.py::TestReplicate1DTrainingCore::test_non_root_forward_backward, test/distributed/_composable/test_replicate_training.py::TestReplicate1DTrainingCore::test_post_optim_event, test/distributed/_composable/test_replicate_training.py::TestReplicate1DTrainingCore::test_train_parity_multi_group_cpu_offload_eager, test/distributed/_composable/test_replicate_training.py::TestReplicate1DTrainingCore::test_train_parity_multi_groups, test/distributed/_composable/test_replicate_training.py::TestReplicate1DTrainingCore::test_train_parity_single_group, test/distributed/_composable/test_replicate_training.py::TestReplicateTrainingCompose::test_train_parity_with_activation_checkpointing, test/distributed/_composable/test_replicate_training.py::TestReplicateSharedParams::test_train_parity_with_shared_params, test/distributed/_composable/test_replicate_training.py::TestReplicateGradientAccumulation::test_1f1b_microbatching, test/distributed/_composable/test_replicate_training.py::TestReplicateGradientAccumulation::test_gradient_accumulation, test/distributed/_composable/test_replicate_training.py::TestReplicateCustomForwardMethod::test_register_fsdp_forward_method, test/distributed/_composable/test_replicate_training.py::TestReplicateTPTraining::test_replicate_tp 2025-12-04T10:48:58.0722025Z 2025-12-04T10:48:58.0722513Z Finished distributed/_composable/test_replicate_training 1/1 ... [2025-12-04 10:48:58.069688][6964.171816153], took 1.67min 2025-12-04T10:48:58.0965163Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_replicate_training/distributed._composable.test_replicate_training-2cbeb0e1e9d2c847.xml 2025-12-04T10:48:58.2032029Z Running distributed/optim/test_apply_optimizer_in_backward 1/1 ... [2025-12-04 10:48:58.202570][6964.304702488] 2025-12-04T10:48:58.2032751Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:48:58.2034104Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/optim/test_apply_optimizer_in_backward.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:48:58.202926] 2025-12-04T10:49:00.5876762Z 2025-12-04T10:49:00.5878048Z distributed/optim/test_apply_optimizer_in_backward 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.optim.test_apply_optimizer_in_backward_1.1_13991a8c54f44830_.log 2025-12-04T10:49:00.5879966Z 2025-12-04T10:49:00.5880464Z Finished distributed/optim/test_apply_optimizer_in_backward 1/1 ... [2025-12-04 10:49:00.587044][6966.689173304], took 0.04min 2025-12-04T10:49:00.6135301Z Running distributed/rpc/test_share_memory 1/1 ... [2025-12-04 10:49:00.612844][6966.714976498] 2025-12-04T10:49:00.6135920Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:49:00.6137286Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/rpc/test_share_memory.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:00.613210] 2025-12-04T10:49:11.7383473Z 2025-12-04T10:49:11.7384898Z distributed/rpc/test_share_memory 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.rpc.test_share_memory_1.1_b0f1b7712293917c_.log 2025-12-04T10:49:11.7386285Z Running 1 items in this shard: test/distributed/rpc/test_share_memory.py::TestRPCPickler::test_case 2025-12-04T10:49:11.7387202Z Running 1 items in this shard: test/distributed/rpc/test_share_memory.py::TestRPCPickler::test_case 2025-12-04T10:49:11.7387753Z 2025-12-04T10:49:11.7388142Z Finished distributed/rpc/test_share_memory 1/1 ... [2025-12-04 10:49:11.738178][6977.84030765], took 0.19min 2025-12-04T10:49:11.7646120Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.test_share_memory/distributed.rpc.test_share_memory-7afea101a44bab53.xml 2025-12-04T10:49:11.8558928Z Running distributed/tensor/test_op_strategy 1/1 ... [2025-12-04 10:49:11.855287][6977.957418935] 2025-12-04T10:49:11.8559578Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:49:11.8560848Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_op_strategy.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:11.855650] 2025-12-04T10:50:43.5011952Z 2025-12-04T10:50:43.5016398Z distributed/tensor/test_op_strategy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_op_strategy_1.1_1fa2e02695839b56_.log 2025-12-04T10:50:43.5029106Z Running 24 items in this shard: test/distributed/tensor/test_op_strategy.py::TestEinsumDims::test_batch_dims, test/distributed/tensor/test_op_strategy.py::TestEinsumDims::test_bmm_dims, test/distributed/tensor/test_op_strategy.py::TestEinsumDims::test_free_dims, test/distributed/tensor/test_op_strategy.py::TestEinsumDims::test_mm_dims, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_bmm_1d_mesh, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_bmm_2d_mesh, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_bmm_diffinndim_2d_mesh, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_bmm_diffoutndim_2d_mesh, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_linearity_1d_mesh, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_mm_1d_mesh, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_mm_2d_mesh, test/distributed/tensor/test_op_strategy.py::TestEinsumStrategies::test_pointwise_1d_mesh, test/distributed/tensor/test_op_strategy.py::TestCostModel::test_bmm_strategies, test/distributed/tensor/test_op_strategy.py::TestCostModel::test_mm_strategies, test/distributed/tensor/test_op_strategy.py::TestCostModel::test_redistribute_cost_latency, test/distributed/tensor/test_op_strategy.py::TestCostModel::test_redistribute_cost_mesh_1d, test/distributed/tensor/test_op_strategy.py::TestCostModel::test_redistribute_cost_mesh_2d, test/distributed/tensor/test_op_strategy.py::DistTensorReplicateStrategyRegistrationTest::test_replicate_strategy_placement, test/distributed/tensor/test_op_strategy.py::DistTensorReplicateStrategyRegistrationTest::test_tuple_replicate_strategy_placement, test/distributed/tensor/test_op_strategy.py::TestStrategyHashing::test_call_with_different_nontensor_args, test/distributed/tensor/test_op_strategy.py::TestStrategyOperation::test_cache_clean, test/distributed/tensor/test_op_strategy.py::DistTensorReplicateStrategyRegistrationTestWithLocalTensor::test_replicate_strategy_placement, test/distributed/tensor/test_op_strategy.py::DistTensorReplicateStrategyRegistrationTestWithLocalTensor::test_tuple_replicate_strategy_placement, test/distributed/tensor/test_op_strategy.py::TestStrategyHashingWithLocalTensor::test_call_with_different_nontensor_args 2025-12-04T10:50:43.5040831Z 2025-12-04T10:50:43.5041234Z Finished distributed/tensor/test_op_strategy 1/1 ... [2025-12-04 10:50:43.500743][7069.602857482], took 1.53min 2025-12-04T10:50:43.5275643Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_op_strategy/distributed.tensor.test_op_strategy-6fbbc916638ee901.xml 2025-12-04T10:50:43.6136133Z Running distributed/fsdp/test_fsdp_grad_acc 1/1 ... [2025-12-04 10:50:43.613241][7069.715373151] 2025-12-04T10:50:43.6136913Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:50:43.6138234Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_grad_acc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:50:43.613609] 2025-12-04T10:51:28.7897119Z 2025-12-04T10:51:28.7898257Z distributed/fsdp/test_fsdp_grad_acc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_grad_acc_1.1_4abfd3a00d3824ee_.log 2025-12-04T10:51:28.7902496Z Running 6 items in this shard: test/distributed/fsdp/test_fsdp_grad_acc.py::TestGradAcc::test_grad_acc_configs0_use_orig_params_False, test/distributed/fsdp/test_fsdp_grad_acc.py::TestGradAcc::test_grad_acc_configs0_use_orig_params_True, test/distributed/fsdp/test_fsdp_grad_acc.py::TestGradAcc::test_grad_acc_configs1_use_orig_params_False, test/distributed/fsdp/test_fsdp_grad_acc.py::TestGradAcc::test_grad_acc_configs1_use_orig_params_True, test/distributed/fsdp/test_fsdp_grad_acc.py::TestGradAcc::test_grad_acc_cpu_offload_use_orig_params_False, test/distributed/fsdp/test_fsdp_grad_acc.py::TestGradAcc::test_grad_acc_cpu_offload_use_orig_params_True 2025-12-04T10:51:28.7905670Z 2025-12-04T10:51:28.7906050Z Finished distributed/fsdp/test_fsdp_grad_acc 1/1 ... [2025-12-04 10:51:28.789573][7114.891701842], took 0.75min 2025-12-04T10:51:28.8161117Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_grad_acc/distributed.fsdp.test_fsdp_grad_acc-a75842029d7b9dcc.xml 2025-12-04T10:51:28.9119940Z Running distributed/checkpoint/test_state_dict_stager 1/1 ... [2025-12-04 10:51:28.911456][7115.01358755] 2025-12-04T10:51:28.9120635Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:51:28.9121960Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_state_dict_stager.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:51:28.911802] 2025-12-04T10:52:02.8129045Z 2025-12-04T10:52:02.8130278Z distributed/checkpoint/test_state_dict_stager 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_state_dict_stager_1.1_039a3cb2334b1e56_.log 2025-12-04T10:52:02.8139240Z Running 14 items in this shard: test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_caching, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_complex_storage_sharing, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_cpu_storage_independence, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_dataclasses, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_different_dtypes, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_empty_tensors, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_tensor_attrs, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_tensor_pinned_and_shared, test/distributed/checkpoint/test_state_dict_stager.py::TestStateDictStager::test_views, test/distributed/checkpoint/test_state_dict_stager.py::TestDTensorStateDictStager::test_dtensor, test/distributed/checkpoint/test_state_dict_stager.py::TestReplicationStager::test_replication_basic, test/distributed/checkpoint/test_state_dict_stager.py::TestReplicationStager::test_replication_dtensors, test/distributed/checkpoint/test_state_dict_stager.py::TestReplicationStager::test_replication_persistence, test/distributed/checkpoint/test_state_dict_stager.py::TestReplicationStager::test_replication_sharded_tensors 2025-12-04T10:52:02.8146732Z 2025-12-04T10:52:02.8147176Z Finished distributed/checkpoint/test_state_dict_stager 1/1 ... [2025-12-04 10:52:02.812693][7148.914823509], took 0.57min 2025-12-04T10:52:02.8389812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_stager/distributed.checkpoint.test_state_dict_stager-c8decf93ed909c05.xml 2025-12-04T10:52:02.9208810Z Running distributed/fsdp/test_fsdp_freezing_weights 1/1 ... [2025-12-04 10:52:02.920392][7149.022523399] 2025-12-04T10:52:02.9209493Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:52:02.9210804Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_freezing_weights.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:52:02.920734] 2025-12-04T10:55:16.8719969Z 2025-12-04T10:55:16.8721173Z distributed/fsdp/test_fsdp_freezing_weights 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_freezing_weights_1.1_1d6984042c8f2cfc_.log 2025-12-04T10:55:16.8759213Z Running 32 items in this shard: test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False_disable_autograd_True_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_False_forward_prefetch_True, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_False, test/distributed/fsdp/test_fsdp_freezing_weights.py::TestFreezingWeights::test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True_disable_autograd_True_forward_prefetch_True 2025-12-04T10:55:16.8796550Z 2025-12-04T10:55:16.8796965Z Finished distributed/fsdp/test_fsdp_freezing_weights 1/1 ... [2025-12-04 10:55:16.871828][7342.973958177], took 3.23min 2025-12-04T10:55:16.8988069Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_freezing_weights/distributed.fsdp.test_fsdp_freezing_weights-c610b4e9e056a60a.xml 2025-12-04T10:55:17.3338690Z Uploading artifacts took 0.35 seconds 2025-12-04T10:55:17.3348085Z Running distributed/_composable/fsdp/test_fully_shard_init 1/1 ... [2025-12-04 10:55:17.334223][7343.436353922] 2025-12-04T10:55:17.3348800Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:55:17.3350121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:55:17.334556] 2025-12-04T10:55:29.8276186Z 2025-12-04T10:55:29.8277710Z distributed/_composable/fsdp/test_fully_shard_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_init_1.1_313b8ba1dd39b14e_.log 2025-12-04T10:55:29.8305105Z Running 42 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardDeviceTensor::test_move_states_to_device_ignored_param_device, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardDeviceTensor::test_move_states_to_device_tensor, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardDeviceDTensor::test_move_states_to_device_dtensor_invalid, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardDeviceDTensor::test_move_states_to_device_dtensor_valid, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardMeshArg::test_2d_mesh_without_mesh_dim_names, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardMeshArg::test_invalid_mesh_ndim, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_modules_duplicate, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_modules_list_of_mlps, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_modules_nested, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_modules_nested_fully_shard_and_replicate, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_modules_single, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_states_list_of_mlps, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_states_nested_fully_shard, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardManagedModulesAndStates::test_managed_states_shared_params_and_buffers, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardParamModuleInfos::test_get_param_module_infos_duplicates, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardParamModuleInfos::test_get_param_module_infos_list_of_mlps, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardParamModuleInfos::test_get_param_module_infos_shared_params, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardedParameterTensor::test_raise_noncontiguous_parameter, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardedParameterTensor::test_raise_scalar_parameter, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardedParameterTensor::test_shard_tensor_parameters, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardedParameterDTensor::test_shard_dtensor_parameters, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardLazyInit::test_fully_shard_double_lazy_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardLazyInit::test_fully_shard_is_root, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardLazyInit::test_fully_shard_module_and_param_fqns, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardLazyInit::test_fully_shard_multi_module_root, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardLazyInit::test_reset_sharded_param_in_lazy_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardMetaDeviceInit::test_invalid_meta_device_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardMetaDeviceInit::test_meta_device_1d_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardMetaDeviceInit::test_meta_device_2d_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardMetaDeviceInit::test_rank0_broadcast_meta_device_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardProcessGroupInit::test_1d_process_group_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardProcessGroupInit::test_2d_process_group_init, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardHSDPBroadcast::test_hsdp_broadcast_across_replicas, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestHSDPWithCustomHook::test_custom_hook_custom_stream, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestHSDPWithCustomHook::test_custom_hsdp_all_reduce_hook, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardPlacementFn::test_init_1d_transformer_shard_dim_neg1, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardPlacementFn::test_init_1d_transformer_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardPlacementFn::test_init_1d_uneven_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardPlacementFn::test_init_2d_transformer_shard_diff_dim, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardShardPlacementFn::test_invalid_shard_dim, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardOldImport::test_old_import_training, test/distributed/_composable/fsdp/test_fully_shard_init.py::TestFullyShardMixedDtypeParam::test_mixed_dtypes_no_grad_param 2025-12-04T10:55:29.8329608Z 2025-12-04T10:55:29.8330041Z Finished distributed/_composable/fsdp/test_fully_shard_init 1/1 ... [2025-12-04 10:55:29.827145][7355.929275757], took 0.21min 2025-12-04T10:55:29.8539422Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_init/distributed._composable.fsdp.test_fully_shard_init-94adf46d5612666a.xml 2025-12-04T10:55:29.9249895Z Running distributed/fsdp/test_fsdp_flatten_params 1/1 ... [2025-12-04 10:55:29.924397][7356.026529409] 2025-12-04T10:55:29.9250551Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:55:29.9251989Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_flatten_params.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:55:29.924741] 2025-12-04T10:56:36.1534992Z 2025-12-04T10:56:36.1536249Z distributed/fsdp/test_fsdp_flatten_params 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_flatten_params_1.1_076e3197ee747eb8_.log 2025-12-04T10:56:36.1545563Z Running 14 items in this shard: test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_empty_module, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_flat_param_shard_metadata_aligned_full_precision, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_flat_param_shard_metadata_aligned_mixed_precision, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_flat_param_shard_metadata_unaligned, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_flat_param_shard_metadata_with_memory_format_memory_format0, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_flat_param_shard_metadata_with_memory_format_memory_format1, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_flatten_nothing, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_numel_with_shared_params, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_numel_without_shared_params, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_output_with_shared_params, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_output_without_shared_params, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_partial_flattening, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_pnorm_after_step_with_shared_params, test/distributed/fsdp/test_fsdp_flatten_params.py::TestFlattenParams::test_writeback_orig_params_no_shard 2025-12-04T10:56:36.1553336Z 2025-12-04T10:56:36.1553760Z Finished distributed/fsdp/test_fsdp_flatten_params 1/1 ... [2025-12-04 10:56:36.153010][7422.255140393], took 1.10min 2025-12-04T10:56:36.1805554Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_flatten_params/distributed.fsdp.test_fsdp_flatten_params-1722984b0a3e650a.xml 2025-12-04T10:56:36.2638743Z Running distributed/test_distributed_spawn 3/9 ... [2025-12-04 10:56:36.263254][7422.36538609] 2025-12-04T10:56:36.2639497Z Running distributed tests for the test backend with env init_method 2025-12-04T10:56:36.2639984Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:56:36.2642914Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:56:36.264102] 2025-12-04T10:56:39.8410996Z 2025-12-04T10:56:39.8412153Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_23c96a6f8ddde9df_.log 2025-12-04T10:56:39.8413936Z Running 0 items in this shard: 2025-12-04T10:56:39.8414161Z 2025-12-04T10:56:39.8415910Z Running distributed tests for the test backend with file init_method 2025-12-04T10:56:39.8417733Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:56:39.8421778Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:56:39.841981] 2025-12-04T10:56:43.4142364Z 2025-12-04T10:56:43.4143502Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_e48186af63d3ea09_.log 2025-12-04T10:56:43.4144820Z Running 0 items in this shard: 2025-12-04T10:56:43.4145065Z 2025-12-04T10:56:43.4150313Z Running distributed tests for the mpi backend with env init_method 2025-12-04T10:56:43.5441952Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:56:43.5443605Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:56:43.543837] 2025-12-04T10:56:47.6845986Z 2025-12-04T10:56:47.6847097Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_09a29604c4f1582f_.log 2025-12-04T10:56:47.6848175Z Running 0 items in this shard: 2025-12-04T10:56:47.6848536Z Running 0 items in this shard: 2025-12-04T10:56:47.6848883Z Running 0 items in this shard: 2025-12-04T10:56:47.6849096Z 2025-12-04T10:56:47.6850895Z Running distributed tests for the mpi backend with file init_method 2025-12-04T10:56:47.8123100Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:56:47.8128099Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:56:47.812324] 2025-12-04T10:56:51.9740316Z 2025-12-04T10:56:51.9741426Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_cf9680aaa50be142_.log 2025-12-04T10:56:51.9742718Z Running 0 items in this shard: 2025-12-04T10:56:51.9743064Z Running 0 items in this shard: 2025-12-04T10:56:51.9743413Z Running 0 items in this shard: 2025-12-04T10:56:51.9743629Z 2025-12-04T10:56:51.9748372Z Running distributed tests for the nccl backend with env init_method 2025-12-04T10:56:51.9750040Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:56:51.9754141Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:56:51.975225] 2025-12-04T11:00:50.2902707Z 2025-12-04T11:00:50.2903818Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_a127edfd2b1d71a7_.log 2025-12-04T11:00:50.2921662Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:00:50.2939441Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T11:00:50.2940741Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T11:00:50.2941972Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product 2025-12-04T11:00:50.2943282Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product 2025-12-04T11:00:50.2944517Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T11:00:50.2945784Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T11:00:50.2947004Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T11:00:50.2948380Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda 2025-12-04T11:00:50.2949619Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group 2025-12-04T11:00:50.2950740Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T11:00:50.2951871Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T11:00:50.2953102Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl 2025-12-04T11:00:50.2954349Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group 2025-12-04T11:00:50.2955701Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger 2025-12-04T11:00:50.2957091Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook 2025-12-04T11:00:50.2958322Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T11:00:50.2959574Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks 2025-12-04T11:00:50.2960847Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T11:00:50.2962047Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T11:00:50.2963169Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T11:00:50.2964393Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T11:00:50.2965641Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks 2025-12-04T11:00:50.2966752Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T11:00:50.2967836Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T11:00:50.2969060Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream 2025-12-04T11:00:50.2970314Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group 2025-12-04T11:00:50.2971665Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T11:00:50.2973210Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view 2025-12-04T11:00:50.2974793Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list 2025-12-04T11:00:50.2976068Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler 2025-12-04T11:00:50.2977474Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:00:50.2978235Z 2025-12-04T11:00:50.2978504Z Running distributed tests for the nccl backend with file init_method 2025-12-04T11:00:50.2979268Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:00:50.2980631Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:00:50.292015] 2025-12-04T11:04:48.3209166Z 2025-12-04T11:04:48.3210555Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_e2f3a9d90cd48f40_.log 2025-12-04T11:04:48.3228819Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:04:48.3246060Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T11:04:48.3247380Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T11:04:48.3248575Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product 2025-12-04T11:04:48.3249851Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product 2025-12-04T11:04:48.3251047Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T11:04:48.3252228Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T11:04:48.3253520Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T11:04:48.3255067Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda 2025-12-04T11:04:48.3256350Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group 2025-12-04T11:04:48.3257511Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T11:04:48.3258673Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T11:04:48.3259965Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl 2025-12-04T11:04:48.3261234Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group 2025-12-04T11:04:48.3262669Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger 2025-12-04T11:04:48.3264120Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook 2025-12-04T11:04:48.3265463Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T11:04:48.3266703Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks 2025-12-04T11:04:48.3268010Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T11:04:48.3269207Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T11:04:48.3270315Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T11:04:48.3271507Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T11:04:48.3272755Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks 2025-12-04T11:04:48.3273865Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T11:04:48.3274953Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T11:04:48.3276148Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream 2025-12-04T11:04:48.3277404Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group 2025-12-04T11:04:48.3278953Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T11:04:48.3280650Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view 2025-12-04T11:04:48.3282066Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list 2025-12-04T11:04:48.3283343Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler 2025-12-04T11:04:48.3284802Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:04:48.3285562Z 2025-12-04T11:04:48.3285826Z Running distributed tests for the gloo backend with env init_method 2025-12-04T11:04:48.3286328Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:04:48.3287694Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:04:48.322405] 2025-12-04T11:09:13.8534364Z 2025-12-04T11:09:13.8535496Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_e45ee060484710ae_.log 2025-12-04T11:09:13.8553296Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:09:13.8570634Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T11:09:13.8571893Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T11:09:13.8573090Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product 2025-12-04T11:09:13.8574685Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product 2025-12-04T11:09:13.8575924Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T11:09:13.8577112Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T11:09:13.8578373Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T11:09:13.8580086Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda 2025-12-04T11:09:13.8581377Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group 2025-12-04T11:09:13.8582542Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T11:09:13.8583708Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T11:09:13.8585026Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl 2025-12-04T11:09:13.8586292Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group 2025-12-04T11:09:13.8587691Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger 2025-12-04T11:09:13.8589140Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook 2025-12-04T11:09:13.8590494Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T11:09:13.8591824Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks 2025-12-04T11:09:13.8593056Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T11:09:13.8594216Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T11:09:13.8595332Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T11:09:13.8596494Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T11:09:13.8597705Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks 2025-12-04T11:09:13.8598779Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T11:09:13.8599935Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T11:09:13.8601150Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream 2025-12-04T11:09:13.8602381Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group 2025-12-04T11:09:13.8603696Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T11:09:13.8605137Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view 2025-12-04T11:09:13.8606469Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list 2025-12-04T11:09:13.8607669Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler 2025-12-04T11:09:13.8608975Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:09:13.8609692Z 2025-12-04T11:09:13.8609943Z Running distributed tests for the gloo backend with file init_method 2025-12-04T11:09:13.8610416Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:09:13.8611726Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=3', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:09:13.855072] 2025-12-04T11:13:39.4019383Z 2025-12-04T11:13:39.4020658Z distributed/test_distributed_spawn 3/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_3.9_7d3f43b9506a343a_.log 2025-12-04T11:13:39.4039095Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:13:39.4056600Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T11:13:39.4057965Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T11:13:39.4059194Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product 2025-12-04T11:13:39.4060509Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_product 2025-12-04T11:13:39.4061788Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T11:13:39.4062981Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T11:13:39.4064242Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T11:13:39.4065729Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group_cuda 2025-12-04T11:13:39.4066972Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group 2025-12-04T11:13:39.4068103Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T11:13:39.4069234Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T11:13:39.4070465Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_no_rank_zero_nccl 2025-12-04T11:13:39.4071695Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_full_group 2025-12-04T11:13:39.4074385Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_without_logger 2025-12-04T11:13:39.4075784Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer_via_hook 2025-12-04T11:13:39.4077014Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T11:13:39.4078257Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks 2025-12-04T11:13:39.4080006Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T11:13:39.4081249Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T11:13:39.4082400Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T11:13:39.4083635Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T11:13:39.4084929Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks 2025-12-04T11:13:39.4086074Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T11:13:39.4087202Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T11:13:39.4088438Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_high_priority_stream 2025-12-04T11:13:39.4089735Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group 2025-12-04T11:13:39.4091355Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T11:13:39.4092813Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view 2025-12-04T11:13:39.4094509Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list 2025-12-04T11:13:39.4095839Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler 2025-12-04T11:13:39.4097224Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T11:13:39.4097998Z 2025-12-04T11:13:39.4098404Z Finished distributed/test_distributed_spawn 3/9 ... [2025-12-04 11:13:39.403191][8445.505320152], took 17.05min 2025-12-04T11:13:39.4315557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-30657b6825f5a9b9.xml 2025-12-04T11:13:39.4981729Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2659630a9052ba32.xml 2025-12-04T11:13:39.5233693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35db7c52cb4e4963.xml 2025-12-04T11:13:39.5482988Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-869317da10a8a2cd.xml 2025-12-04T11:13:39.5745720Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-02900a63a52fefe8.xml 2025-12-04T11:13:39.5953857Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-20776a9c6c63a20c.xml 2025-12-04T11:13:39.6144750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3757b46df7c65698.xml 2025-12-04T11:13:39.6466167Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-222eea109d5ddc53.xml 2025-12-04T11:13:39.6756291Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4fbe4fa34b470fa8.xml 2025-12-04T11:13:39.7082449Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37a4634b8fad8ea.xml 2025-12-04T11:13:39.7382344Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4758d32b2c1097e5.xml 2025-12-04T11:13:39.7676794Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f048625fe3cca682.xml 2025-12-04T11:13:39.7988385Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4dbf8960b326e96e.xml 2025-12-04T11:13:39.8256153Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43e82ed2dcb87d3f.xml 2025-12-04T11:13:39.8553578Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fb482cd3d23f5d7.xml 2025-12-04T11:13:39.8844028Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4860aca019c3141.xml 2025-12-04T11:13:39.9136220Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065ee64125ced44a.xml 2025-12-04T11:13:39.9434895Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7bccf121b18dac1f.xml 2025-12-04T11:13:39.9706126Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0b49c26d87da987.xml 2025-12-04T11:13:39.9973346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3693e8a7c975c90a.xml 2025-12-04T11:13:40.0256186Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fe1b63bffb2c6a3e.xml 2025-12-04T11:13:40.0563475Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c99c8b7e71138e8.xml 2025-12-04T11:13:40.0845158Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-989a5c6b3d6a9965.xml 2025-12-04T11:13:40.1147911Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6823ac8a1907d169.xml 2025-12-04T11:13:40.1460432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ac5bb86a9b9f7fe.xml 2025-12-04T11:13:40.1736120Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32f1715c008a3066.xml 2025-12-04T11:13:40.2185079Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70519aa5531a3696.xml 2025-12-04T11:13:40.2489219Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5dfee2b8fb7ba7cc.xml 2025-12-04T11:13:40.2780489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3caf33f6f4116297.xml 2025-12-04T11:13:40.3466828Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-087aca976836f603.xml 2025-12-04T11:13:40.3732663Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9841b7f077764a91.xml 2025-12-04T11:13:40.3998689Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c090e23a7f2fffdb.xml 2025-12-04T11:13:40.4281990Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5459ba8e137555e.xml 2025-12-04T11:13:40.4553073Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-80a7f4a66302355b.xml 2025-12-04T11:13:40.4834724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2d6cbfeff36754e.xml 2025-12-04T11:13:40.5101552Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f874e7939adea357.xml 2025-12-04T11:13:40.5655602Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7fc3f50bae53c154.xml 2025-12-04T11:13:40.5967590Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e0a914bf0d30c722.xml 2025-12-04T11:13:40.6267051Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0504f34b27fc1bbf.xml 2025-12-04T11:13:40.6534356Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-870b2179518ff22e.xml 2025-12-04T11:13:40.6852578Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-555ef90b62bf7ebe.xml 2025-12-04T11:13:40.7136324Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef83d3f03358d4cd.xml 2025-12-04T11:13:40.7424519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-782ee14ea9598dfa.xml 2025-12-04T11:13:40.7696701Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-206e0f4991cecb09.xml 2025-12-04T11:13:40.8024731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1dfcc212a7b4183f.xml 2025-12-04T11:13:40.8382698Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-58fc1504b82abecc.xml 2025-12-04T11:13:40.8895147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bfe9fe1a7f1a6c.xml 2025-12-04T11:13:40.9182119Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-442ba0fa5d15173c.xml 2025-12-04T11:13:40.9495613Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-316e66014d1168cc.xml 2025-12-04T11:13:40.9874981Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-055618da6a57a7c3.xml 2025-12-04T11:13:41.0142285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59c00404add2d546.xml 2025-12-04T11:13:41.0468335Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4cf6377210f7d08.xml 2025-12-04T11:13:41.0774431Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd879e6321523d4a.xml 2025-12-04T11:13:41.1054614Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11a49a1cb92bd713.xml 2025-12-04T11:13:41.1347391Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4d5ef940e3b8d7d9.xml 2025-12-04T11:13:41.1616434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8d88460db673001.xml 2025-12-04T11:13:41.1899166Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7312243f5e57fab8.xml 2025-12-04T11:13:41.2157950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d24b6142b7eafb02.xml 2025-12-04T11:13:41.2438619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ddaf535a31194a8.xml 2025-12-04T11:13:41.2736295Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1941bd1a84ade60.xml 2025-12-04T11:13:41.3032353Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b4c97a3c97a6540e.xml 2025-12-04T11:13:41.3296043Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a4f9a9719cac0e27.xml 2025-12-04T11:13:41.3616230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffbbd0582ae0ed08.xml 2025-12-04T11:13:41.3906864Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8fceb5aff8b3d919.xml 2025-12-04T11:13:41.4214408Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4991d9e148183f8a.xml 2025-12-04T11:13:41.4762048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2f340536cc04891.xml 2025-12-04T11:13:41.5057892Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d54b5f82a473b96e.xml 2025-12-04T11:13:41.5335305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32d70eb5fbea7de4.xml 2025-12-04T11:13:41.5636604Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-690ee27b5383666e.xml 2025-12-04T11:13:41.5900931Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b61a79c2ba69c2fc.xml 2025-12-04T11:13:41.6165081Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23785637709aa6cb.xml 2025-12-04T11:13:41.6455678Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c2a0306c654054e.xml 2025-12-04T11:13:41.6758063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c55b916a52817402.xml 2025-12-04T11:13:41.7194526Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9018f81b0b2988f3.xml 2025-12-04T11:13:41.7466143Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f11b61d4653bdb7.xml 2025-12-04T11:13:41.7745449Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b135b7c4b7778b0.xml 2025-12-04T11:13:41.7994157Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fce459c37c59c7e.xml 2025-12-04T11:13:41.8296571Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-54617e16ec702700.xml 2025-12-04T11:13:41.8564853Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b550c1faec52346.xml 2025-12-04T11:13:41.9425455Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e030e348c4a08d77.xml 2025-12-04T11:13:41.9735838Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f3321c7d9da11433.xml 2025-12-04T11:13:42.0094974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-84298f366cb09c62.xml 2025-12-04T11:13:42.0357407Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6de764c5b54fbe28.xml 2025-12-04T11:13:42.0645442Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a94cb9872a07c07.xml 2025-12-04T11:13:42.0945958Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42095590a7a2fe72.xml 2025-12-04T11:13:42.1235195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a90373c5366804c7.xml 2025-12-04T11:13:42.1533915Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-557572c546cf01dc.xml 2025-12-04T11:13:42.1914194Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a95186e1de85626.xml 2025-12-04T11:13:42.2228911Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96a6fe2e34c367be.xml 2025-12-04T11:13:42.2558449Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ac9393c6a20ceccf.xml 2025-12-04T11:13:42.2829248Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76abd5633e378237.xml 2025-12-04T11:13:42.3066936Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a41c806c0c2532bc.xml 2025-12-04T11:13:42.3406150Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0fea3b9528c40c04.xml 2025-12-04T11:13:42.3715814Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87084c93c4099c6c.xml 2025-12-04T11:13:42.3997818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4521c7b0bbbf49e6.xml 2025-12-04T11:13:42.4253892Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2ae149566d0f8ed8.xml 2025-12-04T11:13:42.4576033Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ddafbffca1b4a2a2.xml 2025-12-04T11:13:42.4875758Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-495169b72e129e3c.xml 2025-12-04T11:13:42.5148685Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edaf36e92a9dab8d.xml 2025-12-04T11:13:42.5427852Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c38c0e5b27dc65d.xml 2025-12-04T11:13:42.5717264Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aaba21dbf0cfec28.xml 2025-12-04T11:13:42.6706969Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-01e27bee5aad4037.xml 2025-12-04T11:13:42.6994274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d7295af7e1928.xml 2025-12-04T11:13:42.7282527Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-520313bd404147de.xml 2025-12-04T11:13:42.7575755Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a797af99437b20e.xml 2025-12-04T11:13:42.7868724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-37d6ef52a05b7d44.xml 2025-12-04T11:13:42.8188470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2315e451ee36d9f1.xml 2025-12-04T11:13:42.8513624Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ecd0d14f056dd21.xml 2025-12-04T11:13:42.8789634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1beae1b9515a1c25.xml 2025-12-04T11:13:42.9067626Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d4acbddf3c9607.xml 2025-12-04T11:13:42.9362188Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c1a59c2fbd776179.xml 2025-12-04T11:13:42.9695943Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-31b3b5953cdc94eb.xml 2025-12-04T11:13:42.9975583Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0c79eda599deb55f.xml 2025-12-04T11:13:43.0272587Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f54745fb5e614031.xml 2025-12-04T11:13:43.0575795Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8182c37c5ee80d72.xml 2025-12-04T11:13:43.0905891Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ab8cc14caa3fed1d.xml 2025-12-04T11:13:43.1236216Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffbdb3959954615b.xml 2025-12-04T11:13:43.1747645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cef388e39996bffb.xml 2025-12-04T11:13:43.2059215Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-969953b595bcabd5.xml 2025-12-04T11:13:43.2436928Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f0371af0c88c192.xml 2025-12-04T11:13:43.2775159Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-06587ef8e124e088.xml 2025-12-04T11:13:43.3053377Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85f349019e7692ee.xml 2025-12-04T11:13:43.3336533Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7da4ea5ccfcc936b.xml 2025-12-04T11:13:43.3615577Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0583cccd09b9e0c9.xml 2025-12-04T11:13:43.3916616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-940ee9ff608e1e76.xml 2025-12-04T11:13:43.4236518Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-048493e6df7d5c5f.xml 2025-12-04T11:13:43.4547121Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f94739f99a45e5b.xml 2025-12-04T11:13:43.4865101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5d366f728b719f9d.xml 2025-12-04T11:13:43.5118145Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bd04d92534f968b.xml 2025-12-04T11:13:43.5395326Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b2b4bdf6b2f4c521.xml 2025-12-04T11:13:43.5720357Z Running distributed/test_distributed_spawn 6/9 ... [2025-12-04 11:13:43.571433][8449.673564411] 2025-12-04T11:13:43.5721257Z Running distributed tests for the test backend with env init_method 2025-12-04T11:13:43.5721758Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:13:43.5723771Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:43.572181] 2025-12-04T11:13:47.1533524Z 2025-12-04T11:13:47.1534940Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_3bf78f0c79d16c4d_.log 2025-12-04T11:13:47.1536034Z Running 0 items in this shard: 2025-12-04T11:13:47.1536265Z 2025-12-04T11:13:47.1537876Z Running distributed tests for the test backend with file init_method 2025-12-04T11:13:47.1540002Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:13:47.1544360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:47.154221] 2025-12-04T11:13:50.7322049Z 2025-12-04T11:13:50.7323240Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_8c693fd2b3945c2d_.log 2025-12-04T11:13:50.7324352Z Running 0 items in this shard: 2025-12-04T11:13:50.7324564Z 2025-12-04T11:13:50.7326560Z Running distributed tests for the mpi backend with env init_method 2025-12-04T11:13:50.8587237Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:13:50.8591132Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:50.858869] 2025-12-04T11:13:55.1545483Z 2025-12-04T11:13:55.1546832Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_a700db43a456c251_.log 2025-12-04T11:13:55.1547866Z Running 0 items in this shard: 2025-12-04T11:13:55.1548224Z Running 0 items in this shard: 2025-12-04T11:13:55.1548552Z Running 0 items in this shard: 2025-12-04T11:13:55.1548757Z 2025-12-04T11:13:55.1553036Z Running distributed tests for the mpi backend with file init_method 2025-12-04T11:13:55.2846736Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:13:55.2848591Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:55.284552] 2025-12-04T11:13:59.5165266Z 2025-12-04T11:13:59.5166177Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_e8872495f8ff89db_.log 2025-12-04T11:13:59.5167220Z Running 0 items in this shard: 2025-12-04T11:13:59.5167557Z Running 0 items in this shard: 2025-12-04T11:13:59.5168060Z Running 0 items in this shard: 2025-12-04T11:13:59.5168279Z 2025-12-04T11:13:59.5169049Z Running distributed tests for the nccl backend with env init_method 2025-12-04T11:13:59.5170730Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:13:59.5175031Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:59.517275] 2025-12-04T11:18:01.2833571Z 2025-12-04T11:18:01.2836548Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_dcec8ae9e33ccbd0_.log 2025-12-04T11:18:01.2854479Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:18:01.2872526Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient 2025-12-04T11:18:01.2874038Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream 2025-12-04T11:18:01.2875458Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T11:18:01.2876723Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather 2025-12-04T11:18:01.2877868Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex 2025-12-04T11:18:01.2879523Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T11:18:01.2880798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product 2025-12-04T11:18:01.2882093Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T11:18:01.2883476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl 2025-12-04T11:18:01.2884803Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks 2025-12-04T11:18:01.2886027Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T11:18:01.2887234Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T11:18:01.2888597Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T11:18:01.2890145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none 2025-12-04T11:18:01.2891866Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none 2025-12-04T11:18:01.2893326Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T11:18:01.2894686Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object 2025-12-04T11:18:01.2895911Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T11:18:01.2897213Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather 2025-12-04T11:18:01.2898546Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size 2025-12-04T11:18:01.2899963Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T11:18:01.2901269Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity 2025-12-04T11:18:01.2902515Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min 2025-12-04T11:18:01.2903637Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min 2025-12-04T11:18:01.2904820Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice 2025-12-04T11:18:01.2906031Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T11:18:01.2907155Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T11:18:01.2908163Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group 2025-12-04T11:18:01.2909196Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag 2025-12-04T11:18:01.2910270Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda 2025-12-04T11:18:01.2911380Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:18:01.2911995Z 2025-12-04T11:18:01.2912231Z Running distributed tests for the nccl backend with file init_method 2025-12-04T11:18:01.2912691Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:18:01.2913895Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:18:01.284803] 2025-12-04T11:22:03.8056795Z 2025-12-04T11:22:03.8060615Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_f7edb6118d7b5472_.log 2025-12-04T11:22:03.8078522Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:22:03.8097179Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient 2025-12-04T11:22:03.8098827Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream 2025-12-04T11:22:03.8100283Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T11:22:03.8101573Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather 2025-12-04T11:22:03.8102762Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex 2025-12-04T11:22:03.8104063Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T11:22:03.8105557Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product 2025-12-04T11:22:03.8106887Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T11:22:03.8108124Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl 2025-12-04T11:22:03.8109378Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks 2025-12-04T11:22:03.8110538Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T11:22:03.8111684Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T11:22:03.8112974Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T11:22:03.8114393Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none 2025-12-04T11:22:03.8115968Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none 2025-12-04T11:22:03.8117328Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T11:22:03.8118450Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object 2025-12-04T11:22:03.8119634Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T11:22:03.8120845Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather 2025-12-04T11:22:03.8122100Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size 2025-12-04T11:22:03.8123391Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T11:22:03.8124621Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity 2025-12-04T11:22:03.8125802Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min 2025-12-04T11:22:03.8126866Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min 2025-12-04T11:22:03.8127937Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice 2025-12-04T11:22:03.8128997Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T11:22:03.8130065Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T11:22:03.8131135Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group 2025-12-04T11:22:03.8132271Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag 2025-12-04T11:22:03.8133407Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda 2025-12-04T11:22:03.8134799Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:22:03.8135486Z 2025-12-04T11:22:03.8135780Z Running distributed tests for the gloo backend with env init_method 2025-12-04T11:22:03.8136295Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:22:03.8137653Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:03.807242] 2025-12-04T11:26:23.4720588Z 2025-12-04T11:26:23.4721960Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_b2d6c55f75e2baa8_.log 2025-12-04T11:26:23.4740778Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:26:23.4758526Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient 2025-12-04T11:26:23.4760050Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream 2025-12-04T11:26:23.4761475Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T11:26:23.4762737Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather 2025-12-04T11:26:23.4763892Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex 2025-12-04T11:26:23.4765162Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T11:26:23.4766393Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product 2025-12-04T11:26:23.4767682Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T11:26:23.4768959Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl 2025-12-04T11:26:23.4770250Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks 2025-12-04T11:26:23.4771443Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T11:26:23.4772645Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T11:26:23.4774251Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T11:26:23.4775760Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none 2025-12-04T11:26:23.4777399Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none 2025-12-04T11:26:23.4779057Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T11:26:23.4780275Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object 2025-12-04T11:26:23.4781507Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T11:26:23.4782807Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather 2025-12-04T11:26:23.4784222Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size 2025-12-04T11:26:23.4785597Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T11:26:23.4786905Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity 2025-12-04T11:26:23.4788163Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min 2025-12-04T11:26:23.4789293Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min 2025-12-04T11:26:23.4790483Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice 2025-12-04T11:26:23.4791697Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T11:26:23.4792769Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T11:26:23.4794134Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group 2025-12-04T11:26:23.4795265Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag 2025-12-04T11:26:23.4796441Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda 2025-12-04T11:26:23.4797663Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:26:23.4798335Z 2025-12-04T11:26:23.4798597Z Running distributed tests for the gloo backend with file init_method 2025-12-04T11:26:23.4799108Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:26:23.4800477Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=6', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:26:23.473518] 2025-12-04T11:30:43.3937151Z 2025-12-04T11:30:43.3938234Z distributed/test_distributed_spawn 6/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_6.9_85d7f75cf0274716_.log 2025-12-04T11:30:43.3956258Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:30:43.3973999Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient 2025-12-04T11:30:43.3975646Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_non_default_stream 2025-12-04T11:30:43.3977103Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T11:30:43.3978422Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather 2025-12-04T11:30:43.3979864Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_complex 2025-12-04T11:30:43.3981183Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T11:30:43.3982446Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product 2025-12-04T11:30:43.3983754Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T11:30:43.3985064Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl 2025-12-04T11:30:43.3986395Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_different_across_ranks 2025-12-04T11:30:43.3987630Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T11:30:43.3988828Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T11:30:43.3990188Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T11:30:43.3991888Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_set_grad_to_none 2025-12-04T11:30:43.3993412Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_no_set_grad_none 2025-12-04T11:30:43.3994771Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T11:30:43.3995898Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object 2025-12-04T11:30:43.3997096Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T11:30:43.3998315Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allgather 2025-12-04T11:30:43.3999570Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size 2025-12-04T11:30:43.4000854Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T11:30:43.4002079Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity 2025-12-04T11:30:43.4003254Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min 2025-12-04T11:30:43.4004325Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_min 2025-12-04T11:30:43.4005380Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice 2025-12-04T11:30:43.4006437Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T11:30:43.4007513Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T11:30:43.4008590Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group 2025-12-04T11:30:43.4009666Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag 2025-12-04T11:30:43.4010848Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda 2025-12-04T11:30:43.4012080Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_multi_forward 2025-12-04T11:30:43.4012695Z 2025-12-04T11:30:43.4013066Z Finished distributed/test_distributed_spawn 6/9 ... [2025-12-04 11:30:43.394778][9469.496906953], took 17.00min 2025-12-04T11:30:43.4245263Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d045df6f00832674.xml 2025-12-04T11:30:43.5056779Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2e7babf23a98fec.xml 2025-12-04T11:30:43.5283817Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce1d4513008f30b5.xml 2025-12-04T11:30:43.5529202Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9bfba6a18599d81e.xml 2025-12-04T11:30:43.5829020Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89a2aaebca19d4e0.xml 2025-12-04T11:30:43.6106579Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7bfcf341dfacd6a.xml 2025-12-04T11:30:43.6376687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4f854973242c798.xml 2025-12-04T11:30:43.6682540Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-567d6dba79efc754.xml 2025-12-04T11:30:43.6991046Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32dae883b8f409b4.xml 2025-12-04T11:30:43.7296195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f35b4f146cba39ed.xml 2025-12-04T11:30:43.7565854Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11d385bda710106b.xml 2025-12-04T11:30:43.7868211Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16f3fe014f6f2b90.xml 2025-12-04T11:30:43.8195101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-314d187f8a4f1527.xml 2025-12-04T11:30:43.8468593Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-37e0f31b5d0eb001.xml 2025-12-04T11:30:43.8874486Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e98f2eef47033.xml 2025-12-04T11:30:43.9228322Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0373e508dc2d66c0.xml 2025-12-04T11:30:43.9534249Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2844fa97ba5d525c.xml 2025-12-04T11:30:43.9807146Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f42a82f6cff1f523.xml 2025-12-04T11:30:44.0089894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-668dafa0f73e7876.xml 2025-12-04T11:30:44.0395826Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bb4dfdf3a617cc5.xml 2025-12-04T11:30:44.0678109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec275c22b294744.xml 2025-12-04T11:30:44.0947232Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4d848c3599ac1349.xml 2025-12-04T11:30:44.1228501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca68035b34ae5b62.xml 2025-12-04T11:30:44.1524416Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33f84bea341754fa.xml 2025-12-04T11:30:44.1835846Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-57896abf4e71b1aa.xml 2025-12-04T11:30:44.2354109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ce0111ddde5df94.xml 2025-12-04T11:30:44.2647922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-75ed533594a3939c.xml 2025-12-04T11:30:44.2983430Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58dadb18f5c0dbb.xml 2025-12-04T11:30:44.3239049Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53f849c69fe671b8.xml 2025-12-04T11:30:44.3530092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f8ce5171d9832f1.xml 2025-12-04T11:30:44.3808142Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-caca2e381afec0ac.xml 2025-12-04T11:30:44.4107117Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-19e4e102157e6e47.xml 2025-12-04T11:30:44.4394352Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-426e6a6be0a16d20.xml 2025-12-04T11:30:44.4675923Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f27afe7768298281.xml 2025-12-04T11:30:44.4984409Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1742254d60a81949.xml 2025-12-04T11:30:44.5284055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-731fd65d812688e2.xml 2025-12-04T11:30:44.5574494Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb1039971f7213e1.xml 2025-12-04T11:30:44.5897407Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-20eb72c43e1aae78.xml 2025-12-04T11:30:44.6195833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b3c54de3705157f.xml 2025-12-04T11:30:44.6482268Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-708887f203c7efd9.xml 2025-12-04T11:30:44.6776340Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-308288814df1046f.xml 2025-12-04T11:30:44.7036369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a9b36875380651b.xml 2025-12-04T11:30:44.7309541Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b640778d73caa6a8.xml 2025-12-04T11:30:44.7599894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb7fed148c64c9a0.xml 2025-12-04T11:30:44.7885382Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-73e1340249d39849.xml 2025-12-04T11:30:44.8316246Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b53b6883ca9eced.xml 2025-12-04T11:30:44.8567657Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-588a66c8cc282acc.xml 2025-12-04T11:30:44.8851971Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e2b4e294105fc8d.xml 2025-12-04T11:30:44.9167342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1876d2067da9cba3.xml 2025-12-04T11:30:44.9456377Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c0f841dc2b4efb2.xml 2025-12-04T11:30:44.9715362Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a51aa631f2d9c9e.xml 2025-12-04T11:30:45.0006573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-36fdf430bc89743e.xml 2025-12-04T11:30:45.0385360Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e97389ff3c3fed34.xml 2025-12-04T11:30:45.0695816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7431cb10fe427793.xml 2025-12-04T11:30:45.1015608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300aa7c2334a8d69.xml 2025-12-04T11:30:45.1308659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59ebecc4d72f0d7e.xml 2025-12-04T11:30:45.1595070Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cd9ca1fa3ae4b5b.xml 2025-12-04T11:30:45.1879948Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f733a13fa18eefb.xml 2025-12-04T11:30:45.2204928Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ba067e3c1dbc48f6.xml 2025-12-04T11:30:45.2505940Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5982d44c9781d305.xml 2025-12-04T11:30:45.2794438Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d3cdd5c6558c74f8.xml 2025-12-04T11:30:45.3056779Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9abbb37614212e4e.xml 2025-12-04T11:30:45.3576081Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0b61128e53994e8.xml 2025-12-04T11:30:45.3868107Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-efe3817fcb01a456.xml 2025-12-04T11:30:45.4257035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-406d537981e1a2b6.xml 2025-12-04T11:30:45.4554272Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c93bf90c971a4a89.xml 2025-12-04T11:30:45.4896988Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-63e194c9a12582ed.xml 2025-12-04T11:30:45.5216268Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e97ed0330e85ad8d.xml 2025-12-04T11:30:45.5806023Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1f0e24e21eda42b.xml 2025-12-04T11:30:45.6087146Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b16af08a6bda590.xml 2025-12-04T11:30:45.6396331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d719be89195db636.xml 2025-12-04T11:30:45.6667307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53d64dcee99de118.xml 2025-12-04T11:30:45.7043037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea452bf92699d1d7.xml 2025-12-04T11:30:45.7368012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0d1b37af5093171.xml 2025-12-04T11:30:45.7705450Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a15143db106f15f7.xml 2025-12-04T11:30:45.8468851Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf396dfb3b4b2bb4.xml 2025-12-04T11:30:45.8805302Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bb35081df35448e3.xml 2025-12-04T11:30:45.9146049Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35b1275cb510efb5.xml 2025-12-04T11:30:45.9477034Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e9367509f991dbb3.xml 2025-12-04T11:30:45.9766435Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bd7d011a01059ce.xml 2025-12-04T11:30:46.0066264Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3dc760c16caaf0c2.xml 2025-12-04T11:30:46.0336288Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-77a48723971bcff5.xml 2025-12-04T11:30:46.0669203Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db5717cebce92ef6.xml 2025-12-04T11:30:46.1055915Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8eeacffd41be6e39.xml 2025-12-04T11:30:46.1397647Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebed908b36444f09.xml 2025-12-04T11:30:46.1668356Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ef96dc46d436334.xml 2025-12-04T11:30:46.1946514Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ab7428bbadf34c3f.xml 2025-12-04T11:30:46.2278173Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c50e50b95ac4e028.xml 2025-12-04T11:30:46.2616522Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a38147e023273460.xml 2025-12-04T11:30:46.3181143Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95d5c860934b2c1e.xml 2025-12-04T11:30:46.3429154Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-25a3f0e33205a56a.xml 2025-12-04T11:30:46.3882674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eb201dfe29a7ff2.xml 2025-12-04T11:30:46.4182394Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50b8421dbc1badac.xml 2025-12-04T11:30:46.4495995Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11a93d269a83940f.xml 2025-12-04T11:30:46.4797855Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-674b848f32484d1e.xml 2025-12-04T11:30:46.5055057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4121a01708986f89.xml 2025-12-04T11:30:46.5346098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-994ce7e474295a12.xml 2025-12-04T11:30:46.5597474Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee469c21c041c7a4.xml 2025-12-04T11:30:46.5874637Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3246d443991c34d3.xml 2025-12-04T11:30:46.6208713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50ac6d99d2e163f0.xml 2025-12-04T11:30:46.6508880Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-98ebf2381cda380a.xml 2025-12-04T11:30:46.7349101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-02a17fffec5d92e9.xml 2025-12-04T11:30:46.7627011Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fa4be36d0f0fec.xml 2025-12-04T11:30:46.7955985Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b20781e7e3b68cd0.xml 2025-12-04T11:30:46.8394592Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-055ede09a7156885.xml 2025-12-04T11:30:46.8746065Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2dc318e9e7472b33.xml 2025-12-04T11:30:46.9076880Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6d75222f185b4d5.xml 2025-12-04T11:30:46.9364752Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c6a79d484e935ea.xml 2025-12-04T11:30:46.9981719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-723363f561b2e381.xml 2025-12-04T11:30:47.0306115Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfa70be2e3e2e968.xml 2025-12-04T11:30:47.0675315Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c34de188341d059e.xml 2025-12-04T11:30:47.1078344Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3f32df0d2792ec.xml 2025-12-04T11:30:47.1397207Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae835e0e1dbfe3ec.xml 2025-12-04T11:30:47.1708243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41f83edcff84215a.xml 2025-12-04T11:30:47.2005335Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a860decd20224034.xml 2025-12-04T11:30:47.2356147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3963d6f58e4804f4.xml 2025-12-04T11:30:47.2656687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b2893112b7e4a0fd.xml 2025-12-04T11:30:47.3026679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ae1d3298b84e247.xml 2025-12-04T11:30:47.3342915Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce25bc0bb1b2125d.xml 2025-12-04T11:30:47.3716533Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b733f78747bdf8ab.xml 2025-12-04T11:30:47.4028637Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f13a1853be14fff2.xml 2025-12-04T11:30:47.4324080Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e0c8a225205b2b1b.xml 2025-12-04T11:30:47.4606412Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e42484012944fc8.xml 2025-12-04T11:30:47.4945339Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76186faceb0b0d55.xml 2025-12-04T11:30:47.5244816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a9b9276398f16f1.xml 2025-12-04T11:30:47.5564090Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c78e67ea2df6209d.xml 2025-12-04T11:30:47.5866631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7358e80ff8c5c8ba.xml 2025-12-04T11:30:47.6206785Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b10a6c1a1360dd8.xml 2025-12-04T11:30:47.6948823Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cf0679388a3449b.xml 2025-12-04T11:30:47.7266674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-26580fbd1387c903.xml 2025-12-04T11:30:48.2832938Z Uploading artifacts took 0.53 seconds 2025-12-04T11:30:48.2833981Z Running distributed/test_distributed_spawn 9/9 ... [2025-12-04 11:30:48.283252][9474.385382552] 2025-12-04T11:30:48.2836237Z Running distributed tests for the test backend with env init_method 2025-12-04T11:30:48.2838587Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:30:48.2842443Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:30:48.284065] 2025-12-04T11:30:51.8680208Z 2025-12-04T11:30:51.8681308Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_05322304b4268f45_.log 2025-12-04T11:30:51.8682400Z Running 0 items in this shard: 2025-12-04T11:30:51.8682617Z 2025-12-04T11:30:51.8683952Z Running distributed tests for the test backend with file init_method 2025-12-04T11:30:51.8685663Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:30:51.8689836Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:30:51.868785] 2025-12-04T11:30:55.4516685Z 2025-12-04T11:30:55.4517796Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_e1495444ab48b4d4_.log 2025-12-04T11:30:55.4518875Z Running 0 items in this shard: 2025-12-04T11:30:55.4519087Z 2025-12-04T11:30:55.4520054Z Running distributed tests for the mpi backend with env init_method 2025-12-04T11:30:55.5819373Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:30:55.5822909Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:30:55.582054] 2025-12-04T11:30:59.8575925Z 2025-12-04T11:30:59.8577045Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_9a435d3db962ff09_.log 2025-12-04T11:30:59.8578136Z Running 0 items in this shard: 2025-12-04T11:30:59.8578467Z Running 0 items in this shard: 2025-12-04T11:30:59.8579061Z Running 0 items in this shard: 2025-12-04T11:30:59.8579273Z 2025-12-04T11:30:59.8582420Z Running distributed tests for the mpi backend with file init_method 2025-12-04T11:30:59.9843488Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:30:59.9845242Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:30:59.984114] 2025-12-04T11:31:04.2539988Z 2025-12-04T11:31:04.2541132Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_8a0f3583325b9bd1_.log 2025-12-04T11:31:04.2542202Z Running 0 items in this shard: 2025-12-04T11:31:04.2542532Z Running 0 items in this shard: 2025-12-04T11:31:04.2543125Z Running 0 items in this shard: 2025-12-04T11:31:04.2543331Z 2025-12-04T11:31:04.2547976Z Running distributed tests for the nccl backend with env init_method 2025-12-04T11:31:04.2549662Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:31:04.2553638Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:04.255176] 2025-12-04T11:34:27.0916491Z 2025-12-04T11:34:27.0917737Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_b9cbe2f6b7b91336_.log 2025-12-04T11:34:27.0933516Z Running 28 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:34:27.0949388Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T11:34:27.0950580Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group 2025-12-04T11:34:27.0951884Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T11:34:27.0953116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum 2025-12-04T11:34:27.0954259Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min 2025-12-04T11:34:27.0955425Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T11:34:27.0956611Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex 2025-12-04T11:34:27.0957855Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda 2025-12-04T11:34:27.0959075Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters 2025-12-04T11:34:27.0960248Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group 2025-12-04T11:34:27.0961575Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T11:34:27.0963020Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group 2025-12-04T11:34:27.0964367Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks 2025-12-04T11:34:27.0965610Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd 2025-12-04T11:34:27.0966809Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad 2025-12-04T11:34:27.0968066Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T11:34:27.0969260Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks 2025-12-04T11:34:27.0970371Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T11:34:27.0971565Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T11:34:27.0972914Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T11:34:27.0974478Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum 2025-12-04T11:34:27.0975652Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max 2025-12-04T11:34:27.0976845Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda 2025-12-04T11:34:27.0978026Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks 2025-12-04T11:34:27.0979478Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex 2025-12-04T11:34:27.0980602Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv 2025-12-04T11:34:27.0981800Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler 2025-12-04T11:34:27.0983129Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:34:27.0983923Z 2025-12-04T11:34:27.0984177Z Running distributed tests for the nccl backend with file init_method 2025-12-04T11:34:27.0984693Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:34:27.0986042Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:34:27.093002] 2025-12-04T11:37:49.3365985Z 2025-12-04T11:37:49.3367084Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_5b3b256f80a196fa_.log 2025-12-04T11:37:49.3383439Z Running 28 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:37:49.3399011Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T11:37:49.3400209Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group 2025-12-04T11:37:49.3401484Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T11:37:49.3402716Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum 2025-12-04T11:37:49.3403859Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min 2025-12-04T11:37:49.3405021Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T11:37:49.3406203Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex 2025-12-04T11:37:49.3407440Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda 2025-12-04T11:37:49.3408666Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters 2025-12-04T11:37:49.3409837Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group 2025-12-04T11:37:49.3411167Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T11:37:49.3412610Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group 2025-12-04T11:37:49.3414209Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks 2025-12-04T11:37:49.3415492Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd 2025-12-04T11:37:49.3416736Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad 2025-12-04T11:37:49.3418070Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T11:37:49.3419272Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks 2025-12-04T11:37:49.3420412Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T11:37:49.3421639Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T11:37:49.3423041Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T11:37:49.3424379Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum 2025-12-04T11:37:49.3425545Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max 2025-12-04T11:37:49.3426818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda 2025-12-04T11:37:49.3427992Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks 2025-12-04T11:37:49.3429102Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex 2025-12-04T11:37:49.3430193Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv 2025-12-04T11:37:49.3431351Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler 2025-12-04T11:37:49.3432670Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:37:49.3433393Z 2025-12-04T11:37:49.3433635Z Running distributed tests for the gloo backend with env init_method 2025-12-04T11:37:49.3434131Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:37:49.3435447Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:49.337978] 2025-12-04T11:41:50.9804513Z 2025-12-04T11:41:50.9805891Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_46e43edcce57a01b_.log 2025-12-04T11:41:50.9825076Z Running 28 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:41:50.9840612Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T11:41:50.9841915Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group 2025-12-04T11:41:50.9843119Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T11:41:50.9844336Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum 2025-12-04T11:41:50.9845480Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min 2025-12-04T11:41:50.9846626Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T11:41:50.9847800Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex 2025-12-04T11:41:50.9849036Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda 2025-12-04T11:41:50.9850274Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters 2025-12-04T11:41:50.9851439Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group 2025-12-04T11:41:50.9852801Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T11:41:50.9854494Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group 2025-12-04T11:41:50.9855880Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks 2025-12-04T11:41:50.9857163Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd 2025-12-04T11:41:50.9858421Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad 2025-12-04T11:41:50.9859751Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T11:41:50.9860958Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks 2025-12-04T11:41:50.9862104Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T11:41:50.9863349Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T11:41:50.9864728Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T11:41:50.9866160Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum 2025-12-04T11:41:50.9867293Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max 2025-12-04T11:41:50.9868450Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda 2025-12-04T11:41:50.9869617Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks 2025-12-04T11:41:50.9870738Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex 2025-12-04T11:41:50.9871830Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv 2025-12-04T11:41:50.9872994Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler 2025-12-04T11:41:50.9874322Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:41:50.9875055Z 2025-12-04T11:41:50.9875304Z Running distributed tests for the gloo backend with file init_method 2025-12-04T11:41:50.9875813Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:41:50.9877131Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=9', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:50.982053] 2025-12-04T11:45:53.0649185Z 2025-12-04T11:45:53.0652047Z distributed/test_distributed_spawn 9/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_9.9_b2ce0dce90fd794f_.log 2025-12-04T11:45:53.0668361Z Running 28 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:45:53.0684118Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T11:45:53.0685447Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_full_group 2025-12-04T11:45:53.0686698Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T11:45:53.0687966Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum 2025-12-04T11:45:53.0689144Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min 2025-12-04T11:45:53.0690324Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T11:45:53.0691658Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda_complex 2025-12-04T11:45:53.0692904Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda 2025-12-04T11:45:53.0694382Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters 2025-12-04T11:45:53.0695589Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_full_group 2025-12-04T11:45:53.0697018Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T11:45:53.0698476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group 2025-12-04T11:45:53.0699877Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_num_params_across_ranks 2025-12-04T11:45:53.0701175Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd 2025-12-04T11:45:53.0702472Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad 2025-12-04T11:45:53.0703766Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T11:45:53.0704975Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_checks 2025-12-04T11:45:53.0706333Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T11:45:53.0707508Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T11:45:53.0708811Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T11:45:53.0710078Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum 2025-12-04T11:45:53.0711182Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_max 2025-12-04T11:45:53.0712351Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_tensor_cuda 2025-12-04T11:45:53.0713476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks 2025-12-04T11:45:53.0714546Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex 2025-12-04T11:45:53.0715613Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv 2025-12-04T11:45:53.0716778Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler 2025-12-04T11:45:53.0718052Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T11:45:53.0718751Z 2025-12-04T11:45:53.0719134Z Finished distributed/test_distributed_spawn 9/9 ... [2025-12-04 11:45:53.065764][10379.167893149], took 15.08min 2025-12-04T11:45:53.0962345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-386d51a44811a37c.xml 2025-12-04T11:45:53.1776651Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a4c267fd423ef4fb.xml 2025-12-04T11:45:53.2034240Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422923c4a3ff9000.xml 2025-12-04T11:45:53.2306301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cefb54cda6fbbd60.xml 2025-12-04T11:45:53.2633446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1caa1cefc24b5566.xml 2025-12-04T11:45:53.2895507Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6f460f0a05f7b70.xml 2025-12-04T11:45:53.3191341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7b2ab90303eda495.xml 2025-12-04T11:45:53.3465633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15ad2d4900971da8.xml 2025-12-04T11:45:53.3775912Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bcbe99e24fef6d4.xml 2025-12-04T11:45:53.4073492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc5956b0c1d4a301.xml 2025-12-04T11:45:53.4494835Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4b951168e9fd206.xml 2025-12-04T11:45:53.4835723Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-34586d84d9e8b574.xml 2025-12-04T11:45:53.5134872Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-346f56e4dab409e9.xml 2025-12-04T11:45:53.5751867Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86debbf241b646f7.xml 2025-12-04T11:45:53.6076007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82deea2afca27b05.xml 2025-12-04T11:45:53.6356362Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68e170c9c9399262.xml 2025-12-04T11:45:53.6637026Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cb408a515a3366a.xml 2025-12-04T11:45:53.6957602Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-753ec666c5362e2c.xml 2025-12-04T11:45:53.7206370Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6738b863174a943.xml 2025-12-04T11:45:53.7467856Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0285e1ff7eb488da.xml 2025-12-04T11:45:53.7765020Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-10c387b6e730ad52.xml 2025-12-04T11:45:53.8064166Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffaa3570c6b46738.xml 2025-12-04T11:45:53.8373591Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d71dba8115f24b8.xml 2025-12-04T11:45:53.8654209Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8b4b6bac082ff37.xml 2025-12-04T11:45:53.8997235Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2c322ea7f3ebbd2.xml 2025-12-04T11:45:53.9282878Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f03e83663e9a4601.xml 2025-12-04T11:45:53.9567203Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b48b6a282b4704f.xml 2025-12-04T11:45:53.9933484Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cfe16ef8c24bdf2.xml 2025-12-04T11:45:54.0235578Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53f3ffdc1525a0fa.xml 2025-12-04T11:45:54.0534393Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4005fef94f6d8aae.xml 2025-12-04T11:45:54.0814977Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6e8960fc72004342.xml 2025-12-04T11:45:54.1095823Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2756d0ca7e2d02f0.xml 2025-12-04T11:45:54.1416356Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-badcb6057cbf1c54.xml 2025-12-04T11:45:54.1666089Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95a798b363d84da6.xml 2025-12-04T11:45:54.2047258Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a7b490db4a9340e.xml 2025-12-04T11:45:54.2335902Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4a0629607c3fa5fd.xml 2025-12-04T11:45:54.2621900Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5dbb3e619be4e12b.xml 2025-12-04T11:45:54.2914275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ded592e1a5a858e0.xml 2025-12-04T11:45:54.3231565Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bb87080fbf758d2.xml 2025-12-04T11:45:54.3597886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e5898fa75a7f3ef.xml 2025-12-04T11:45:54.3905989Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b16de0301895bd76.xml 2025-12-04T11:45:54.4209419Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2cc94a088d38104.xml 2025-12-04T11:45:54.4536294Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c2b712ac61ede43d.xml 2025-12-04T11:45:54.4825015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc30b3da1a421347.xml 2025-12-04T11:45:54.5165346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9fe2cef34c40d5eb.xml 2025-12-04T11:45:54.5454505Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c658fa95ccd2cf6a.xml 2025-12-04T11:45:54.5816632Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-19a881494eafddb6.xml 2025-12-04T11:45:54.6084227Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95816b9cbefdfdc1.xml 2025-12-04T11:45:54.6367921Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2148875896839649.xml 2025-12-04T11:45:54.6676433Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0127aecf34ab538.xml 2025-12-04T11:45:54.7233366Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0d23338c1bb516d.xml 2025-12-04T11:45:54.7536916Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8fc91952c534b6f9.xml 2025-12-04T11:45:54.7816243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-335bee712b3a4821.xml 2025-12-04T11:45:54.8146622Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1537c74fba16ec9.xml 2025-12-04T11:45:54.8438236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f23e70d3f89edf0.xml 2025-12-04T11:45:54.8765395Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148183526f50032e.xml 2025-12-04T11:45:54.9055709Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8190f13f98ccf625.xml 2025-12-04T11:45:54.9421341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d1fbcdbdafd7eb07.xml 2025-12-04T11:45:54.9702548Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8debf7e3f937383c.xml 2025-12-04T11:45:55.0034444Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42a201482bdd75d2.xml 2025-12-04T11:45:55.0332091Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9c4803d3c82177.xml 2025-12-04T11:45:55.0634454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70ca556779e39720.xml 2025-12-04T11:45:55.1030573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d96eaef94da7418.xml 2025-12-04T11:45:55.1357145Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3dacf547cad6a67d.xml 2025-12-04T11:45:55.1656296Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3616b6f968436ee1.xml 2025-12-04T11:45:55.1967747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-622c1eaa2fb92c9d.xml 2025-12-04T11:45:55.2264961Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-306192510a63b544.xml 2025-12-04T11:45:55.2517067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82095872e2babc1f.xml 2025-12-04T11:45:55.2837756Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c84a9abadf4e6456.xml 2025-12-04T11:45:55.3125078Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9893ace33f3830.xml 2025-12-04T11:45:55.3499146Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-daefb987452748c4.xml 2025-12-04T11:45:55.3794377Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3092859278d7bcb6.xml 2025-12-04T11:45:55.4155269Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f10308d532e69d5.xml 2025-12-04T11:45:55.4456177Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6ce015a4b2872362.xml 2025-12-04T11:45:55.4716740Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-224e5072b13b6f72.xml 2025-12-04T11:45:55.5053955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-722d50f6db17be0a.xml 2025-12-04T11:45:55.5566703Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3132a5f3cbdc8d40.xml 2025-12-04T11:45:55.5882131Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c2952957ec5e4941.xml 2025-12-04T11:45:55.6220603Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-088131fd2e1eb740.xml 2025-12-04T11:45:55.6532940Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e35a0f5543ab7ba.xml 2025-12-04T11:45:55.6836819Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cec746d21c82a41d.xml 2025-12-04T11:45:55.7131652Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33aec040946b9fff.xml 2025-12-04T11:45:55.7393424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6714c1f1957114b9.xml 2025-12-04T11:45:55.7746218Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5fb352bbe9e8c7.xml 2025-12-04T11:45:55.8126872Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-308f137ee98c9d9b.xml 2025-12-04T11:45:55.8433690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5cf923d1d132c738.xml 2025-12-04T11:45:55.8814547Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e9017a66c842280.xml 2025-12-04T11:45:55.9075819Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52008c5414012f53.xml 2025-12-04T11:45:55.9438856Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-707c869a600b32c4.xml 2025-12-04T11:45:55.9682114Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9afcc52ba56cb7cd.xml 2025-12-04T11:45:56.0059109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5db097ce06fd3ba.xml 2025-12-04T11:45:56.0346953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5f966f935cdcc510.xml 2025-12-04T11:45:56.0665417Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4a30815173ad737.xml 2025-12-04T11:45:56.0976428Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41db9b895b480d7b.xml 2025-12-04T11:45:56.1305757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78a881039e14b5b2.xml 2025-12-04T11:45:56.1605784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd62c820cf2113b8.xml 2025-12-04T11:45:56.1917945Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c61700ab79aac30.xml 2025-12-04T11:45:56.2176324Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae9f946365ef7f7.xml 2025-12-04T11:45:56.2506461Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d551f94bc57d8aff.xml 2025-12-04T11:45:56.2827141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dcae987a61c4873a.xml 2025-12-04T11:45:56.3095671Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca6886c3086a09a1.xml 2025-12-04T11:45:56.4127460Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aa0548b0a0f16fd5.xml 2025-12-04T11:45:56.4407372Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b837bfe0a5661087.xml 2025-12-04T11:45:56.4759097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42e3e45e9107b083.xml 2025-12-04T11:45:56.5150219Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2e64285e29d874e0.xml 2025-12-04T11:45:56.5544629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4a385e35494820f4.xml 2025-12-04T11:45:56.6184389Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9cc203997a753f09.xml 2025-12-04T11:45:56.6545775Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0913cee8f07dc0af.xml 2025-12-04T11:45:56.6873825Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5fb0cf096433beb.xml 2025-12-04T11:45:56.7262342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4d28c8a1a46915c.xml 2025-12-04T11:45:56.7564565Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b653d29a66c2470a.xml 2025-12-04T11:45:56.7867145Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e7285ab6b5c9527e.xml 2025-12-04T11:45:56.8198725Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a54941f41b8cfe8.xml 2025-12-04T11:45:56.8507895Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91e59cc2a798549a.xml 2025-12-04T11:45:56.8847586Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e72c23544464af54.xml 2025-12-04T11:45:56.9147457Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-45f80f9c137d75a5.xml 2025-12-04T11:45:56.9461946Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d93447ba16fc454.xml 2025-12-04T11:45:56.9775957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24191bbc1a349edf.xml 2025-12-04T11:45:57.0126364Z Running distributed/test_composability 1/1 ... [2025-12-04 11:45:57.012019][10383.114151031] 2025-12-04T11:45:57.0127091Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:45:57.0128482Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_composability.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:45:57.012356] 2025-12-04T11:46:14.6738331Z 2025-12-04T11:46:14.6739457Z distributed/test_composability 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_composability_1.1_47908043dcf692eb_.log 2025-12-04T11:46:14.6747303Z Running 13 items in this shard: test/distributed/test_composability.py::ComposabilityTest::test_pp_ddp_ScheduleClass0, test/distributed/test_composability.py::ComposabilityTest::test_pp_ddp_ScheduleClass1, test/distributed/test_composability.py::ComposabilityTest::test_pp_ddp_ScheduleClass2, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass0, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass1, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass2, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass3, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_ScheduleClass0, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_ScheduleClass1, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_ScheduleClass2, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_dp_type_FSDP_ScheduleClass3, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_unshard_reshard_runtime_dp_type_FSDP, test/distributed/test_composability.py::ComposabilityTest::test_pp_fsdp_unshard_reshard_runtime_dp_type_FSDP_MP 2025-12-04T11:46:14.6754097Z 2025-12-04T11:46:14.6754484Z Finished distributed/test_composability 1/1 ... [2025-12-04 11:46:14.673658][10400.775788549], took 0.29min 2025-12-04T11:46:14.7041776Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_composability/distributed.test_composability-4dcd79eb001aa4cf.xml 2025-12-04T11:46:14.8253011Z Running distributed/test_multi_threaded_pg 1/1 ... [2025-12-04 11:46:14.824711][10400.926842301] 2025-12-04T11:46:14.8253770Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:46:14.8255223Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_multi_threaded_pg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:14.825057] 2025-12-04T11:46:19.0503903Z 2025-12-04T11:46:19.0505121Z distributed/test_multi_threaded_pg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_multi_threaded_pg_1.1_0640315784817164_.log 2025-12-04T11:46:19.0517078Z Running 22 items in this shard: test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_all_to_all_single_list, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_all_to_all_single_none, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_all_to_all_single_tensor, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_broadcast_object_list, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_collective_error_on_rank_non_zero, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_collective_error_on_rank_non_zero_all, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_collective_error_on_rank_zero, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_skip, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_reduce, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_reduce_coalesced, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_reduce_ops, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_to_all, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_allgather, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_assert_equal_on_rank, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_broadcast, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_broadcast_object_list, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_bwd_sees_fwd_pg, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_gather, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_reduce_scatter, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_scatter, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_subpg, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_using_pg_from_another_thread 2025-12-04T11:46:19.0527331Z 2025-12-04T11:46:19.0527709Z Finished distributed/test_multi_threaded_pg 1/1 ... [2025-12-04 11:46:19.049880][10405.152012087], took 0.07min 2025-12-04T11:46:19.0797417Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_multi_threaded_pg/distributed.test_multi_threaded_pg-cb00591a34ee6ad2.xml 2025-12-04T11:46:19.1513951Z Running distributed/_composable/fsdp/test_fully_shard_extensions 1/1 ... [2025-12-04 11:46:19.150810][10405.252942738] 2025-12-04T11:46:19.1514714Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:46:19.1516084Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_extensions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:19.151170] 2025-12-04T11:46:41.3174103Z 2025-12-04T11:46:41.3175747Z distributed/_composable/fsdp/test_fully_shard_extensions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_extensions_1.1_670c155f675be16b_.log 2025-12-04T11:46:41.3181204Z Running 5 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiProcess::test_all_gather_extensions_train_parity, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extension_hsdp_mesh, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extension_outer_size_stride, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extensions_end_to_end, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extensions_monkey_patch 2025-12-04T11:46:41.3185475Z 2025-12-04T11:46:41.3185975Z Finished distributed/_composable/fsdp/test_fully_shard_extensions 1/1 ... [2025-12-04 11:46:41.316848][10427.418979139], took 0.37min 2025-12-04T11:46:41.3465499Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_extensions/distributed._composable.fsdp.test_fully_shard_extensions-e4e54db12d00fc4b.xml 2025-12-04T11:46:41.4757836Z Running distributed/fsdp/test_wrap 1/1 ... [2025-12-04 11:46:41.475198][10427.577329906] 2025-12-04T11:46:41.4758410Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:46:41.4759626Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_wrap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:41.475551] 2025-12-04T11:49:17.6898368Z 2025-12-04T11:49:17.6899580Z distributed/fsdp/test_wrap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_wrap_1.1_9a143bedfca4724f_.log 2025-12-04T11:49:17.6927645Z Running 52 items in this shard: test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_bn_always_wrapped_individually, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_wrap_batchnorm_individually_use_or_policy_False, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_wrap_batchnorm_individually_use_or_policy_True, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_zero_argument, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_always_wrap, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_always_wrap_with_ignored_modules_wrap_method0, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_always_wrap_with_ignored_modules_wrap_method1, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_api, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_exclude_wrap, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_exclude_wrap_include_children, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_force_leaf, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_force_leaf_custom, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload0_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload0_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload1_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload1_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload0_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload0_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload1_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload1_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_with_ignored_modules_wrap_method0, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_with_ignored_modules_wrap_method1, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_custom_policy, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_frozen_params, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_module_wrap_policy, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_module_wrap_policy_callable, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_transformer_auto_wrap_policy, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_disabled_outside_context, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_override_defaults, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_wrap_method0, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_wrap_method1, test/distributed/fsdp/test_wrap.py::TestWrapUtils::test_validate_frozen_params 2025-12-04T11:49:17.6955506Z 2025-12-04T11:49:17.6955902Z Finished distributed/fsdp/test_wrap 1/1 ... [2025-12-04 11:49:17.689966][10583.792095752], took 2.60min 2025-12-04T11:49:17.7202689Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_wrap/distributed.fsdp.test_wrap-8d38fac6f1a86713.xml 2025-12-04T11:49:17.8413865Z Running distributed/fsdp/test_fsdp_hybrid_shard 1/1 ... [2025-12-04 11:49:17.840855][10583.942986237] 2025-12-04T11:49:17.8414704Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:49:17.8416021Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_hybrid_shard.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:49:17.841196] 2025-12-04T11:50:15.6547686Z 2025-12-04T11:50:15.6556177Z distributed/fsdp/test_fsdp_hybrid_shard 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_hybrid_shard_1.1_393c4bfd443690e9_.log 2025-12-04T11:50:15.6560502Z Running 6 items in this shard: test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_fsdp_hybrid_shard_basic_setup, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_fsdp_hybrid_shard_parity, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_hsdp_save_load_state_dict, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_hsdp_sync_module_state, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_invalid_pg_specification_raises, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_raises_manual_wrap_hybrid_shard_when_none_policy 2025-12-04T11:50:15.6563960Z 2025-12-04T11:50:15.6564384Z Finished distributed/fsdp/test_fsdp_hybrid_shard 1/1 ... [2025-12-04 11:50:15.654595][10641.756724514], took 0.96min 2025-12-04T11:50:15.6853178Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_hybrid_shard/distributed.fsdp.test_fsdp_hybrid_shard-b37436896c0f0a07.xml 2025-12-04T11:50:15.8101296Z Running distributed/_composable/fsdp/test_fully_shard_training 1/1 ... [2025-12-04 11:50:15.809881][10641.912011974] 2025-12-04T11:50:15.8102038Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:50:15.8103943Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_training.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:15.810205] 2025-12-04T11:58:39.5211108Z 2025-12-04T11:58:39.5212408Z distributed/_composable/fsdp/test_fully_shard_training 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_training_1.1_6ed6432c508bcf99_.log 2025-12-04T11:58:39.5393180Z Running 25 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardForwardInputs::test_root_move_forward_input_to_device, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardRegisteredParams::test_param_registration_after_backward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardRegisteredParams::test_param_registration_after_forward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardCastAfterInit::test_to_float64_after_init, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_explicit_prefetching, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_multi_forward_module, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_non_root_forward_backward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_post_optim_event, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_cpu_offload_eager, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_unshard_async_op, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_single_group_shard_dim0, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_single_group_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCompose::test_train_parity_with_activation_checkpointing, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShardPlacementFnMultiProcess::test_train_parity_shard_placement_fn_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShardPlacementFnMultiThread::test_shard_placement_fn_contiguous_params_grads, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardSharedParams::test_train_parity_with_shared_params, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardGradientAccumulation::test_1f1b_microbatching, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardGradientAccumulation::test_gradient_accumulation, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardNDTraining::test_2d_mlp_with_nd_mesh, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardHSDP3DTraining::test_3d_mlp_with_nd_mesh, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardHSDPTraining::test_train_parity_hsdp, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardCustomForwardMethod::test_register_fsdp_forward_method, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShareCommContext::test_share_comm_context, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardWorldSize1::test_train_parity_single_worldsize1 2025-12-04T11:58:39.6060155Z 2025-12-04T11:58:39.6067693Z Finished distributed/_composable/fsdp/test_fully_shard_training 1/1 ... [2025-12-04 11:58:39.606565][11145.708689597], took 8.40min 2025-12-04T11:58:39.6371202Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_training/distributed._composable.fsdp.test_fully_shard_training-6c9b30f951a7219e.xml 2025-12-04T11:58:40.6474894Z Uploading artifacts took 0.89 seconds 2025-12-04T11:58:40.6476446Z Running distributed/rpc/cuda/test_tensorpipe_agent 1/2 ... [2025-12-04 11:58:40.647349][11146.749479469] 2025-12-04T11:58:40.6477086Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:58:40.6479593Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/rpc/cuda/test_tensorpipe_agent.py', '--shard-id=1', '--num-shards=2', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:58:40.647716] 2025-12-04T12:07:34.9323719Z 2025-12-04T12:07:34.9324870Z distributed/rpc/cuda/test_tensorpipe_agent 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.rpc.cuda.test_tensorpipe_agent_1.2_384a5ff4692986ce_.log 2025-12-04T12:07:34.9356852Z Running 47 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaRpcTest::test_profiler_remote_cuda, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDistAutogradTest::test_gpu_simple, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDistAutogradTest::test_gpu_to_cpu_continuation, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDistAutogradTest::test_gpu_to_cpu_continuation_gpu_root, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaRemoteModuleTest::test_invalid_devices, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaRemoteModuleTest::test_valid_device, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDdpComparisonTest::test_ddp_dist_autograd_local_vs_remote_gpu, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_async_execution_with_cuda_future, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_can_extract_cuda_tensor, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_can_extract_list_with_cuda_sparse_tensor, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_can_extract_list_with_cuda_tensor, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_device_not_cuda, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_custom_stream_nested, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_cpu, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_default, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_default_to_non_default, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_1, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_2, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_3, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_4, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_5, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_6, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_2, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_3, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_4, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_5, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_6, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_8, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_non_default_to_default, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_to_cpu_non_default, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_in_options, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_invalid_min_device, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_missing_config_remote, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_missing_config_response_loop, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_one_to_many, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_return_to_gpu, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_return_to_gpu_self, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_wrong_worker_name, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_mismatch, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_devices_option_mismatch_reverse, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_owner_rref_forward_synchronization1, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_owner_rref_forward_synchronization3, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_rref_forward_synchronization1, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_rref_forward_synchronization2, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_rref_to_here_synchronization3, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeCudaDistAutogradTest::test_dist_autograd_sync_streams, test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeCudaDistAutogradTest::test_gradients_synchronizations 2025-12-04T12:07:34.9388328Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaRpcTest::test_profiler_remote_cuda 2025-12-04T12:07:34.9389675Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDistAutogradTest::test_gpu_simple 2025-12-04T12:07:34.9391095Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDistAutogradTest::test_gpu_to_cpu_continuation 2025-12-04T12:07:34.9392483Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDistAutogradTest::test_gpu_to_cpu_continuation_gpu_root 2025-12-04T12:07:34.9393842Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaRemoteModuleTest::test_invalid_devices 2025-12-04T12:07:34.9395088Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaRemoteModuleTest::test_valid_device 2025-12-04T12:07:34.9396496Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeCudaDdpComparisonTest::test_ddp_dist_autograd_local_vs_remote_gpu 2025-12-04T12:07:34.9398017Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_async_execution_with_cuda_future 2025-12-04T12:07:34.9399565Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_can_extract_cuda_tensor 2025-12-04T12:07:34.9401199Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_can_extract_list_with_cuda_sparse_tensor 2025-12-04T12:07:34.9402887Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_can_extract_list_with_cuda_tensor 2025-12-04T12:07:34.9404467Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_cuda_future_device_not_cuda 2025-12-04T12:07:34.9405905Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_custom_stream_nested 2025-12-04T12:07:34.9407312Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_cpu 2025-12-04T12:07:34.9408694Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_default 2025-12-04T12:07:34.9410194Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_default_to_non_default 2025-12-04T12:07:34.9411743Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_1 2025-12-04T12:07:34.9413224Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_2 2025-12-04T12:07:34.9414854Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_3 2025-12-04T12:07:34.9416324Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_4 2025-12-04T12:07:34.9417789Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_5 2025-12-04T12:07:34.9419233Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_6 2025-12-04T12:07:34.9420736Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_2 2025-12-04T12:07:34.9422264Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_3 2025-12-04T12:07:34.9423824Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_4 2025-12-04T12:07:34.9425431Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_5 2025-12-04T12:07:34.9426898Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_6 2025-12-04T12:07:34.9428247Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_mixed_self_8 2025-12-04T12:07:34.9429678Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_non_default_to_default 2025-12-04T12:07:34.9431112Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_map_gpu_to_cpu_non_default 2025-12-04T12:07:34.9432462Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_in_options 2025-12-04T12:07:34.9433805Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_invalid_min_device 2025-12-04T12:07:34.9435204Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_missing_config_remote 2025-12-04T12:07:34.9436646Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_missing_config_response_loop 2025-12-04T12:07:34.9438027Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_one_to_many 2025-12-04T12:07:34.9439859Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_return_to_gpu 2025-12-04T12:07:34.9441242Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_return_to_gpu_self 2025-12-04T12:07:34.9442616Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_maps_wrong_worker_name 2025-12-04T12:07:34.9443918Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_device_mismatch 2025-12-04T12:07:34.9445256Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_devices_option_mismatch_reverse 2025-12-04T12:07:34.9446667Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_owner_rref_forward_synchronization1 2025-12-04T12:07:34.9448108Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_owner_rref_forward_synchronization3 2025-12-04T12:07:34.9449511Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_rref_forward_synchronization1 2025-12-04T12:07:34.9450882Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_rref_forward_synchronization2 2025-12-04T12:07:34.9452269Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeAgentCudaRpcTest::test_rref_to_here_synchronization3 2025-12-04T12:07:34.9453911Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeCudaDistAutogradTest::test_dist_autograd_sync_streams 2025-12-04T12:07:34.9455548Z Running 1 items in this shard: test/distributed/rpc/cuda/test_tensorpipe_agent.py::TensorPipeTensorPipeCudaDistAutogradTest::test_gradients_synchronizations 2025-12-04T12:07:34.9456397Z 2025-12-04T12:07:34.9456861Z Finished distributed/rpc/cuda/test_tensorpipe_agent 1/2 ... [2025-12-04 12:07:34.933643][11681.035771986], took 8.90min 2025-12-04T12:07:34.9647219Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-d961ab4b1fb94450.xml 2025-12-04T12:07:35.0527847Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2749604adbdd83d7.xml 2025-12-04T12:07:35.0838572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a08d4078ed5ab3a4.xml 2025-12-04T12:07:35.1124380Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f043e483cea1f140.xml 2025-12-04T12:07:35.1405969Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-562cdf6dc98614a4.xml 2025-12-04T12:07:35.1756365Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7a355d7d848a3783.xml 2025-12-04T12:07:35.2047130Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e938b1bd7ee21e63.xml 2025-12-04T12:07:35.2336441Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7b620378c03b2b8c.xml 2025-12-04T12:07:35.2636012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-51e337155d132168.xml 2025-12-04T12:07:35.3305766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-858bb2cae53302d6.xml 2025-12-04T12:07:35.3647316Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f578d6bffa26b363.xml 2025-12-04T12:07:35.3964552Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-3d9a69729c5194dd.xml 2025-12-04T12:07:35.4275790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-907751cfd0f9a14d.xml 2025-12-04T12:07:35.4545650Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-9a92a0723441ebbb.xml 2025-12-04T12:07:35.4828427Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-000e8f0311241e72.xml 2025-12-04T12:07:35.5359789Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-18ef89366410fd29.xml 2025-12-04T12:07:35.5666844Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ae7171f8ebe30954.xml 2025-12-04T12:07:35.5966830Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-03f5357c50df6990.xml 2025-12-04T12:07:35.6284630Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-df0148e80116049a.xml 2025-12-04T12:07:35.6606954Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f8722be29ef1f355.xml 2025-12-04T12:07:35.6894835Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-5b2d107716225579.xml 2025-12-04T12:07:35.7189818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ad4182055d9f07f8.xml 2025-12-04T12:07:35.7501483Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-b572842ca37510d6.xml 2025-12-04T12:07:35.7828411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-0b1a30e997ca6431.xml 2025-12-04T12:07:35.8103659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e562e66ae42d90cb.xml 2025-12-04T12:07:35.8415134Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e753f34d0efc412f.xml 2025-12-04T12:07:35.8734237Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-3e869ce9df9ec961.xml 2025-12-04T12:07:35.9028960Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2b82b1f0b5e5f8cd.xml 2025-12-04T12:07:35.9327658Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a8c407980a31ffc0.xml 2025-12-04T12:07:35.9606672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2efe6b4638116b91.xml 2025-12-04T12:07:35.9907368Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-98c90d11be4a4494.xml 2025-12-04T12:07:36.0226162Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-4931d5d769ccdcd0.xml 2025-12-04T12:07:36.0555616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-05476064036aedd6.xml 2025-12-04T12:07:36.1045525Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-8e55f782fe295c55.xml 2025-12-04T12:07:36.1367137Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-fed8717c328843a6.xml 2025-12-04T12:07:36.1655468Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-de76ea5d9fe707d8.xml 2025-12-04T12:07:36.1956812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-0f7e482c95e1e619.xml 2025-12-04T12:07:36.2256148Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7ab8d1b1bfba2dc7.xml 2025-12-04T12:07:36.2554830Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-79d029624b0186af.xml 2025-12-04T12:07:36.2856623Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7ef38c3a629f7d6a.xml 2025-12-04T12:07:36.3295000Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a3eb91c932561739.xml 2025-12-04T12:07:36.3594476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-5c4ed4bc28536a40.xml 2025-12-04T12:07:36.3884470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-4d50dcf186234952.xml 2025-12-04T12:07:36.4189126Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ce507d7407dcfdcd.xml 2025-12-04T12:07:36.4668273Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-fb440b5504373386.xml 2025-12-04T12:07:36.4959404Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-170bafadd5b2b85e.xml 2025-12-04T12:07:36.5282006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-58170fe4322a80c7.xml 2025-12-04T12:07:36.6016673Z Running distributed/optim/test_zero_redundancy_optimizer 1/1 ... [2025-12-04 12:07:36.601433][11682.703565049] 2025-12-04T12:07:36.6017414Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:07:36.6019382Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/optim/test_zero_redundancy_optimizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:07:36.601749] 2025-12-04T12:11:47.8071810Z 2025-12-04T12:11:47.8074755Z distributed/optim/test_zero_redundancy_optimizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.optim.test_zero_redundancy_optimizer_1.1_1138196092c61589_.log 2025-12-04T12:11:47.8112809Z Running 42 items in this shard: test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_constructor, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_lr_scheduler, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_same_dense_param_type, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_state_dict, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_step_with_extra_inner_key, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_step_with_kwargs, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_step_without_closure, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerSingleRank::test_zero_grad, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_add_param_group, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_collect_shards, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_local_optimizer_parity_optimizer_class_str_Adam_maximize_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_local_optimizer_parity_optimizer_class_str_Adam_maximize_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_local_optimizer_parity_optimizer_class_str_SGD_maximize_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_local_optimizer_parity_optimizer_class_str_SGD_maximize_True, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_lr_scheduler, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_multiple_param_groups, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_nondefault_process_group, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_sharding, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_step, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_step_with_closure, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_zero_join_cpu, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_zero_join_gpu, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_zero_model_parallel_parameters_as_bucket_view_False, test/distributed/optim/test_zero_redundancy_optimizer.py::TestZeroRedundancyOptimizerDistributed::test_zero_model_parallel_parameters_as_bucket_view_True 2025-12-04T12:11:47.8148044Z 2025-12-04T12:11:47.8148483Z Finished distributed/optim/test_zero_redundancy_optimizer 1/1 ... [2025-12-04 12:11:47.807155][11933.909283462], took 4.19min 2025-12-04T12:11:47.8433098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.optim.test_zero_redundancy_optimizer/distributed.optim.test_zero_redundancy_optimizer-541994707c39cee5.xml 2025-12-04T12:11:47.9655925Z Running distributed/rpc/test_tensorpipe_agent 1/1 ... [2025-12-04 12:11:47.965214][11934.067345492] 2025-12-04T12:11:47.9656653Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:11:47.9658041Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/rpc/test_tensorpipe_agent.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:11:47.965579] 2025-12-04T12:11:51.7132901Z 2025-12-04T12:11:51.7134450Z distributed/rpc/test_tensorpipe_agent 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.rpc.test_tensorpipe_agent_1.1_e870d0f3c8f660e1_.log 2025-12-04T12:11:51.7135574Z Running 0 items in this shard: 2025-12-04T12:11:51.7135789Z 2025-12-04T12:11:51.7136460Z Finished distributed/rpc/test_tensorpipe_agent 1/1 ... [2025-12-04 12:11:51.712796][11937.81492579], took 0.06min 2025-12-04T12:11:51.7863348Z Running distributed/test_c10d_gloo 2/2 ... [2025-12-04 12:11:51.786091][11937.888223645] 2025-12-04T12:11:51.7863935Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:11:51.7866636Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_gloo.py', '--shard-id=2', '--num-shards=2', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:11:51.786441] 2025-12-04T12:27:52.2282828Z 2025-12-04T12:27:52.2283891Z distributed/test_c10d_gloo 2/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_gloo_2.2_9367ba993beea467_.log 2025-12-04T12:27:52.2342735Z Running 119 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousEnvTest::test_logging_init, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_complex, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_cuda_dispatched, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output_unused_param, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_register_just_once, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_complex_params, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_init, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_return_type, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_find_unused_parameters_when_unused_parameters_empty, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_static_graph, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_integer_list, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_torch_device_list, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_2gpu_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_4gpu_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output_with_unused_parameters, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_invalid_powerSGD_state, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_save_load_checkpoint, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input, test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_optimizer, test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_unused_parameters, test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_multi_bucket, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_barrier_implies_wait, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_multi_device_constructor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_complex, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_cuda_dispatched, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_barrier_implies_wait, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_long, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_multi_device_constructor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_pickle, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_cuda_dispatched, test/distributed/test_c10d_gloo.py::CommTest::test_bool_tensors, test/distributed/test_c10d_gloo.py::CommTest::test_gloo_warn_not_in_group, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_default, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_subgroup, test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_mismatch, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_collectives, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_default_process_group, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_monitored_barrier, test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync, test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_sanity_check 2025-12-04T12:27:52.2397588Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousEnvTest::test_logging_init 2025-12-04T12:27:52.2398465Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics 2025-12-04T12:27:52.2399412Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics_cuda 2025-12-04T12:27:52.2400381Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_checks 2025-12-04T12:27:52.2401305Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_inference_mode 2025-12-04T12:27:52.2402293Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_noncontiguous_input 2025-12-04T12:27:52.2403282Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_async 2025-12-04T12:27:52.2404239Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks 2025-12-04T12:27:52.2405210Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_stress 2025-12-04T12:27:52.2406194Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_op_timeout 2025-12-04T12:27:52.2407119Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress_cuda 2025-12-04T12:27:52.2408058Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_block_current_stream_cuda 2025-12-04T12:27:52.2408980Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_checks 2025-12-04T12:27:52.2409855Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics 2025-12-04T12:27:52.2410931Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics_cuda 2025-12-04T12:27:52.2411858Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_checks 2025-12-04T12:27:52.2412836Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_noncontiguous_input 2025-12-04T12:27:52.2414058Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress 2025-12-04T12:27:52.2415106Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics_cuda 2025-12-04T12:27:52.2416092Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_checks 2025-12-04T12:27:52.2417069Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter 2025-12-04T12:27:52.2418077Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor 2025-12-04T12:27:52.2419243Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor_coalesced 2025-12-04T12:27:52.2420298Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_checks 2025-12-04T12:27:52.2421300Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress_cuda 2025-12-04T12:27:52.2422323Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_complex 2025-12-04T12:27:52.2423360Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_checks 2025-12-04T12:27:52.2424458Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_cuda_dispatched 2025-12-04T12:27:52.2425655Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output_unused_param 2025-12-04T12:27:52.2427244Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False 2025-12-04T12:27:52.2428607Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing 2025-12-04T12:27:52.2429962Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_False 2025-12-04T12:27:52.2431367Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_False 2025-12-04T12:27:52.2432681Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_register_just_once 2025-12-04T12:27:52.2433819Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_complex_params 2025-12-04T12:27:52.2434928Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_init 2025-12-04T12:27:52.2436133Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_return_type 2025-12-04T12:27:52.2437440Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_find_unused_parameters_when_unused_parameters_empty 2025-12-04T12:27:52.2438982Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_static_graph 2025-12-04T12:27:52.2440241Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_integer_list 2025-12-04T12:27:52.2441494Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_torch_device_list 2025-12-04T12:27:52.2442655Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_2gpu_module 2025-12-04T12:27:52.2443693Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_4gpu_module 2025-12-04T12:27:52.2444799Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output_with_unused_parameters 2025-12-04T12:27:52.2445905Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_invalid_powerSGD_state 2025-12-04T12:27:52.2446911Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_save_load_checkpoint 2025-12-04T12:27:52.2447960Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input 2025-12-04T12:27:52.2448987Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_optimizer 2025-12-04T12:27:52.2449896Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_unused_parameters 2025-12-04T12:27:52.2450805Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_multi_bucket 2025-12-04T12:27:52.2451724Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics 2025-12-04T12:27:52.2452713Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics_cuda 2025-12-04T12:27:52.2454005Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_async 2025-12-04T12:27:52.2462061Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_noncontiguous_input 2025-12-04T12:27:52.2463437Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress_cuda 2025-12-04T12:27:52.2464572Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics 2025-12-04T12:27:52.2465725Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_basics 2025-12-04T12:27:52.2468361Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_barrier_implies_wait 2025-12-04T12:27:52.2469422Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics_cuda 2025-12-04T12:27:52.2470470Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_checks 2025-12-04T12:27:52.2471508Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics_cuda 2025-12-04T12:27:52.2472538Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress 2025-12-04T12:27:52.2473622Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_multi_device_constructor 2025-12-04T12:27:52.2474783Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics 2025-12-04T12:27:52.2475750Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics_cuda 2025-12-04T12:27:52.2476712Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_checks 2025-12-04T12:27:52.2477648Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter 2025-12-04T12:27:52.2478793Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress_cuda 2025-12-04T12:27:52.2480059Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics_cuda 2025-12-04T12:27:52.2481157Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress 2025-12-04T12:27:52.2482244Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress_cuda 2025-12-04T12:27:52.2483444Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_complex 2025-12-04T12:27:52.2484614Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics_cuda 2025-12-04T12:27:52.2485825Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_checks 2025-12-04T12:27:52.2487039Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_cuda_dispatched 2025-12-04T12:27:52.2488246Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_checks 2025-12-04T12:27:52.2489353Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_into_tensor_coalesced 2025-12-04T12:27:52.2490527Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_noncontiguous_input 2025-12-04T12:27:52.2491831Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics 2025-12-04T12:27:52.2492789Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_checks 2025-12-04T12:27:52.2494048Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_async 2025-12-04T12:27:52.2495159Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_basics 2025-12-04T12:27:52.2496261Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks 2025-12-04T12:27:52.2497409Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks_cuda 2025-12-04T12:27:52.2498590Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress_cuda 2025-12-04T12:27:52.2499655Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_barrier_implies_wait 2025-12-04T12:27:52.2500684Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics 2025-12-04T12:27:52.2501731Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics_cuda 2025-12-04T12:27:52.2502779Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_checks 2025-12-04T12:27:52.2503779Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_empty_tensors 2025-12-04T12:27:52.2504795Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics 2025-12-04T12:27:52.2505914Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_noncontiguous_input 2025-12-04T12:27:52.2506851Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress 2025-12-04T12:27:52.2507699Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_long 2025-12-04T12:27:52.2508583Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_multi_device_constructor 2025-12-04T12:27:52.2509507Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics 2025-12-04T12:27:52.2510409Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics_cuda 2025-12-04T12:27:52.2511343Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor 2025-12-04T12:27:52.2512244Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress 2025-12-04T12:27:52.2513169Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress_cuda 2025-12-04T12:27:52.2514071Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics 2025-12-04T12:27:52.2514956Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_checks 2025-12-04T12:27:52.2515823Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress 2025-12-04T12:27:52.2516730Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_pickle 2025-12-04T12:27:52.2517654Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics 2025-12-04T12:27:52.2518670Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_cuda_dispatched 2025-12-04T12:27:52.2519561Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_bool_tensors 2025-12-04T12:27:52.2520351Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_gloo_warn_not_in_group 2025-12-04T12:27:52.2521239Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_default 2025-12-04T12:27:52.2522174Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_subgroup 2025-12-04T12:27:52.2523061Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_mismatch 2025-12-04T12:27:52.2524076Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single 2025-12-04T12:27:52.2525285Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced 2025-12-04T12:27:52.2526499Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_collectives 2025-12-04T12:27:52.2527677Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_default_process_group 2025-12-04T12:27:52.2528989Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend 2025-12-04T12:27:52.2530289Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_monitored_barrier 2025-12-04T12:27:52.2531346Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync 2025-12-04T12:27:52.2532239Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_sanity_check 2025-12-04T12:27:52.2532790Z 2025-12-04T12:27:52.2533109Z Finished distributed/test_c10d_gloo 2/2 ... [2025-12-04 12:27:52.231355][12898.333483608], took 16.01min 2025-12-04T12:27:52.2681128Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08577689f3d858a6.xml 2025-12-04T12:27:52.3566554Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c743ab120f5be65e.xml 2025-12-04T12:27:52.3879282Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a323813051995ff.xml 2025-12-04T12:27:52.4201911Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c255dbdc27d4d77.xml 2025-12-04T12:27:52.4545446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9f1202b2ef2d0e8.xml 2025-12-04T12:27:52.5093449Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8281b785ed89747f.xml 2025-12-04T12:27:52.5401345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de1cfac33910faa2.xml 2025-12-04T12:27:52.5728870Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-864ab6c46117080c.xml 2025-12-04T12:27:52.6010272Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76784aced4c97984.xml 2025-12-04T12:27:52.6283143Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-703de7f7dca9caca.xml 2025-12-04T12:27:52.6639866Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0adaa780135147c4.xml 2025-12-04T12:27:52.6932393Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9b446c4fb19b7fa9.xml 2025-12-04T12:27:52.7257341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a7525f86ad26c33d.xml 2025-12-04T12:27:52.7669896Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6753a042db8c209.xml 2025-12-04T12:27:52.7982050Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2349534806fd7876.xml 2025-12-04T12:27:52.8256539Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-42ee5035490db9e3.xml 2025-12-04T12:27:52.8529136Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-50ae93e5877e6267.xml 2025-12-04T12:27:52.8799718Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a89286bdf69beec6.xml 2025-12-04T12:27:52.9119876Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe0b73bb8411ca0.xml 2025-12-04T12:27:52.9439906Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b626fd6cfef7d3a.xml 2025-12-04T12:27:52.9729643Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8330e64091d76cd1.xml 2025-12-04T12:27:53.0032419Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c23dbc048e48feb8.xml 2025-12-04T12:27:53.0337057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-89bb327f63e4e8bb.xml 2025-12-04T12:27:53.0603034Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9c8a9e0d041cedea.xml 2025-12-04T12:27:53.0943519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-42322dfcd604c1a2.xml 2025-12-04T12:27:53.1257666Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-05269c810dd53a0a.xml 2025-12-04T12:27:53.1577041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-496a22975dc71079.xml 2025-12-04T12:27:53.1898609Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-63ac84cde87e453b.xml 2025-12-04T12:27:53.2229132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4f81974b0076c9ff.xml 2025-12-04T12:27:53.2537351Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6030e8ef08b1e09a.xml 2025-12-04T12:27:53.2890895Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bd13602843a8b4fd.xml 2025-12-04T12:27:53.3172846Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9df479069693f235.xml 2025-12-04T12:27:53.3509421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4af234e53b06ab6b.xml 2025-12-04T12:27:53.3816540Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cc6adfb19ea74de2.xml 2025-12-04T12:27:53.4168140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f1e4f77d9fcfd90.xml 2025-12-04T12:27:53.4483712Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-27f8ee33cfcf037f.xml 2025-12-04T12:27:53.4777510Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cdeff74b74265a35.xml 2025-12-04T12:27:53.5092809Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a9993b41f082320c.xml 2025-12-04T12:27:53.5430570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3652407d28f2c215.xml 2025-12-04T12:27:53.5726747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3dba64b9a4678ed.xml 2025-12-04T12:27:53.6041236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3a5b171659e4eecd.xml 2025-12-04T12:27:53.6449214Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c138db9f1cd0e29e.xml 2025-12-04T12:27:53.6760734Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-113748ca2c5988c5.xml 2025-12-04T12:27:53.7132530Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9d127482d5c0a15f.xml 2025-12-04T12:27:53.7440631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c6be5601f204f6a.xml 2025-12-04T12:27:53.7753426Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f818599ff0d45c17.xml 2025-12-04T12:27:53.8060851Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6285dc53d0288723.xml 2025-12-04T12:27:53.8450731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-13ada6365eaa3764.xml 2025-12-04T12:27:53.8776510Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a8eee205d4f58a65.xml 2025-12-04T12:27:53.9094127Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-88abaaf5c04af9c6.xml 2025-12-04T12:27:53.9382516Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a90decd7629fa5.xml 2025-12-04T12:27:53.9672419Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9d96b951f094adb8.xml 2025-12-04T12:27:53.9948265Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-21b096edd5ee1b6a.xml 2025-12-04T12:27:54.0250052Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a0b61d4860845bc5.xml 2025-12-04T12:27:54.0551635Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0dbdae2b70ece5b8.xml 2025-12-04T12:27:54.0911999Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-af13cf2684481c71.xml 2025-12-04T12:27:54.1210373Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d92a42ccc718652.xml 2025-12-04T12:27:54.1570080Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3c2b86f8fd4d9656.xml 2025-12-04T12:27:54.1878876Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5a0ef3ecf28b8b71.xml 2025-12-04T12:27:54.2193960Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-dde6a545db9a9d8a.xml 2025-12-04T12:27:54.2517988Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5facf21e5ab1561.xml 2025-12-04T12:27:54.2879202Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8fe4943e76f9400e.xml 2025-12-04T12:27:54.3162343Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1c6197817aebc3a0.xml 2025-12-04T12:27:54.3496777Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aec658dbe3b6bd3e.xml 2025-12-04T12:27:54.3841884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eda16941220f320e.xml 2025-12-04T12:27:54.4191556Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e4305ddff77c0f5f.xml 2025-12-04T12:27:54.4531519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0e3a131c8f0d73a7.xml 2025-12-04T12:27:54.4889028Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1ba7f15a8c05fd39.xml 2025-12-04T12:27:54.5218478Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-af2c578cafcc8d63.xml 2025-12-04T12:27:54.5521285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-578bb605dc4ba552.xml 2025-12-04T12:27:54.5811378Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b86217cb65e9b710.xml 2025-12-04T12:27:54.6152197Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f2854c8345d5fc1c.xml 2025-12-04T12:27:54.6476837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c653d1bf5ed2c0fa.xml 2025-12-04T12:27:54.6807309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-55cd7c7ba73bcdc2.xml 2025-12-04T12:27:54.7130689Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ffbbe82dbea62111.xml 2025-12-04T12:27:54.7448863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-538ab2b49097f28c.xml 2025-12-04T12:27:54.7752085Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fe265780a95359c3.xml 2025-12-04T12:27:54.8067156Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-faa48c97f4df83cb.xml 2025-12-04T12:27:54.8412042Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c0cd36b746cb3623.xml 2025-12-04T12:27:54.8711690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3b1171fb862d722e.xml 2025-12-04T12:27:54.8998021Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-df4dd3eb2cdaa291.xml 2025-12-04T12:27:54.9296397Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-dd69f8dcdadc9fb4.xml 2025-12-04T12:27:54.9589183Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f54bdf9dde5bf97e.xml 2025-12-04T12:27:54.9931582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-83c9fcb3057380d5.xml 2025-12-04T12:27:55.0251681Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e826f0e43397861e.xml 2025-12-04T12:27:55.0598588Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e73673b2d41bd335.xml 2025-12-04T12:27:55.0911496Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1456794f12812e80.xml 2025-12-04T12:27:55.1193414Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e4e8c70c34dfdfe1.xml 2025-12-04T12:27:55.1530204Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-be166d073ca55795.xml 2025-12-04T12:27:55.1834374Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c49bebe8dd8e46f5.xml 2025-12-04T12:27:55.2162336Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c0f82bf827258b5a.xml 2025-12-04T12:27:55.2469303Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-983f78b8ab7f02b9.xml 2025-12-04T12:27:55.2791142Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1ceb5c06515b90e6.xml 2025-12-04T12:27:55.3217387Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8871c4d5afa11da1.xml 2025-12-04T12:27:55.3949492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-005dd80a1d165bd6.xml 2025-12-04T12:27:55.4292645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b8dfb4b667064c84.xml 2025-12-04T12:27:55.4606666Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-279f10471d7d2185.xml 2025-12-04T12:27:55.5049091Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2732174555722247.xml 2025-12-04T12:27:55.5371936Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-39eff7f371737bec.xml 2025-12-04T12:27:55.5694825Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1bd2ca8c9ccb5d1d.xml 2025-12-04T12:27:55.6009885Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6de6dc4ebcbeafe7.xml 2025-12-04T12:27:55.6911237Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef1db87d402e61e.xml 2025-12-04T12:27:55.7368605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-07b0bb118974fac5.xml 2025-12-04T12:27:55.7719385Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b45dd1b51a454859.xml 2025-12-04T12:27:55.8042233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c5c7f8af22555084.xml 2025-12-04T12:27:55.8371079Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d5f0a9727dbb8c1.xml 2025-12-04T12:27:55.8710918Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-efc5df0b6f603c8e.xml 2025-12-04T12:27:55.9081169Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3947ad663b8a0a5.xml 2025-12-04T12:27:55.9432308Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9a8e98b254282d7c.xml 2025-12-04T12:27:55.9771134Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6f2a37a539841f4a.xml 2025-12-04T12:27:56.0138559Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-11da4acc57c01e3e.xml 2025-12-04T12:27:56.0603292Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-efa00c53d9ffafd0.xml 2025-12-04T12:27:56.0998074Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c139de0f468bfbd0.xml 2025-12-04T12:27:56.1484551Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-508c49e985b343e0.xml 2025-12-04T12:27:56.1918048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-85c75a6ef966eda2.xml 2025-12-04T12:27:56.2368927Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3b5821ebfa1d2b9.xml 2025-12-04T12:27:56.2659922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-78c3c6f081c06510.xml 2025-12-04T12:27:56.3011564Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eceb8537f545c10b.xml 2025-12-04T12:27:56.3369175Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-997f8b36df0838da.xml 2025-12-04T12:27:57.4038824Z Uploading artifacts took 0.99 seconds 2025-12-04T12:27:57.4040482Z Running distributed/test_launcher 1/1 ... [2025-12-04 12:27:57.403805][12903.505935731] 2025-12-04T12:27:57.4041060Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:27:57.4043735Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_launcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:27:57.404191] 2025-12-04T12:28:02.2809597Z 2025-12-04T12:28:02.2810903Z distributed/test_launcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_launcher_1.1_d44269eab8f8d94e_.log 2025-12-04T12:28:02.2812337Z Running 1 items in this shard: test/distributed/test_launcher.py::TestDistributedLaunch::test_launch_user_script 2025-12-04T12:28:02.2812913Z 2025-12-04T12:28:02.2813401Z Finished distributed/test_launcher 1/1 ... [2025-12-04 12:28:02.280326][12908.382456406], took 0.08min 2025-12-04T12:28:02.3358196Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_launcher/distributed.test_launcher-ab711efd5b5eae9c.xml 2025-12-04T12:28:02.4102279Z Running distributed/test_store 1/1 ... [2025-12-04 12:28:02.410000][12908.512131032] 2025-12-04T12:28:02.4102882Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:28:02.4105754Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_store.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:28:02.410373] 2025-12-04T12:36:29.5211275Z 2025-12-04T12:36:29.5212791Z distributed/test_store 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_store_1.1_9c10cd7a4c8d4bdb_.log 2025-12-04T12:36:29.5260543Z Running 126 items in this shard: test/distributed/test_store.py::FileStoreTest::test_append, test/distributed/test_store.py::FileStoreTest::test_clone, test/distributed/test_store.py::FileStoreTest::test_compare_set, test/distributed/test_store.py::FileStoreTest::test_init_pg_and_rpc_with_same_file, test/distributed/test_store.py::FileStoreTest::test_list_keys, test/distributed/test_store.py::FileStoreTest::test_multi_get, test/distributed/test_store.py::FileStoreTest::test_multi_set, test/distributed/test_store.py::FileStoreTest::test_queues, test/distributed/test_store.py::FileStoreTest::test_queues_bidirectional, test/distributed/test_store.py::FileStoreTest::test_queues_nonblocking, test/distributed/test_store.py::FileStoreTest::test_queues_timeout, test/distributed/test_store.py::FileStoreTest::test_refcount, test/distributed/test_store.py::FileStoreTest::test_set_get_check, test/distributed/test_store.py::FileStoreTest::test_simple_wait, test/distributed/test_store.py::HashStoreTest::test_append, test/distributed/test_store.py::HashStoreTest::test_clone, test/distributed/test_store.py::HashStoreTest::test_compare_set, test/distributed/test_store.py::HashStoreTest::test_list_keys, test/distributed/test_store.py::HashStoreTest::test_multi_get, test/distributed/test_store.py::HashStoreTest::test_multi_set, test/distributed/test_store.py::HashStoreTest::test_queues, test/distributed/test_store.py::HashStoreTest::test_queues_bidirectional, test/distributed/test_store.py::HashStoreTest::test_queues_nonblocking, test/distributed/test_store.py::HashStoreTest::test_queues_timeout, test/distributed/test_store.py::HashStoreTest::test_set_get_check, test/distributed/test_store.py::HashStoreTest::test_simple_wait, test/distributed/test_store.py::PrefixStoreTest::test_get_underlying_store, test/distributed/test_store.py::PrefixFileStoreTest::test_append, test/distributed/test_store.py::PrefixFileStoreTest::test_clone, test/distributed/test_store.py::PrefixFileStoreTest::test_compare_set, test/distributed/test_store.py::PrefixFileStoreTest::test_list_keys, test/distributed/test_store.py::PrefixFileStoreTest::test_multi_get, test/distributed/test_store.py::PrefixFileStoreTest::test_multi_set, test/distributed/test_store.py::PrefixFileStoreTest::test_queues, test/distributed/test_store.py::PrefixFileStoreTest::test_queues_bidirectional, test/distributed/test_store.py::PrefixFileStoreTest::test_queues_nonblocking, test/distributed/test_store.py::PrefixFileStoreTest::test_queues_timeout, test/distributed/test_store.py::PrefixFileStoreTest::test_set_get_check, test/distributed/test_store.py::PrefixFileStoreTest::test_simple_wait, test/distributed/test_store.py::TCPStoreTest::test_address_already_in_use, test/distributed/test_store.py::TCPStoreTest::test_agent_store, test/distributed/test_store.py::TCPStoreTest::test_append, test/distributed/test_store.py::TCPStoreTest::test_clone, test/distributed/test_store.py::TCPStoreTest::test_compare_set, test/distributed/test_store.py::TCPStoreTest::test_init_pg_and_rpc_with_same_socket, test/distributed/test_store.py::TCPStoreTest::test_list_keys, test/distributed/test_store.py::TCPStoreTest::test_multi_get, test/distributed/test_store.py::TCPStoreTest::test_multi_set, test/distributed/test_store.py::TCPStoreTest::test_multi_worker_with_fixed_world_size, test/distributed/test_store.py::TCPStoreTest::test_multi_worker_with_nonfixed_world_size, test/distributed/test_store.py::TCPStoreTest::test_multitenancy, test/distributed/test_store.py::TCPStoreTest::test_numkeys_delkeys, test/distributed/test_store.py::TCPStoreTest::test_queues, test/distributed/test_store.py::TCPStoreTest::test_queues_bidirectional, test/distributed/test_store.py::TCPStoreTest::test_queues_nonblocking, test/distributed/test_store.py::TCPStoreTest::test_queues_timeout, test/distributed/test_store.py::TCPStoreTest::test_repr, test/distributed/test_store.py::TCPStoreTest::test_set_get_check, test/distributed/test_store.py::TCPStoreTest::test_simple_wait, test/distributed/test_store.py::TCPStoreTest::test_store_timeout_on_missing_clients, test/distributed/test_store.py::TCPStoreTest::test_take_over_listen_socket, test/distributed/test_store.py::TCPStoreTest::test_world_size_0_raises, test/distributed/test_store.py::LibUvTCPStoreTest::test_address_already_in_use, test/distributed/test_store.py::LibUvTCPStoreTest::test_agent_store, test/distributed/test_store.py::LibUvTCPStoreTest::test_append, test/distributed/test_store.py::LibUvTCPStoreTest::test_clone, test/distributed/test_store.py::LibUvTCPStoreTest::test_compare_set, test/distributed/test_store.py::LibUvTCPStoreTest::test_init_pg_and_rpc_with_same_socket, test/distributed/test_store.py::LibUvTCPStoreTest::test_list_keys, test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_get, test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_set, test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_worker_with_fixed_world_size, test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_worker_with_nonfixed_world_size, test/distributed/test_store.py::LibUvTCPStoreTest::test_multitenancy, test/distributed/test_store.py::LibUvTCPStoreTest::test_numkeys_delkeys, test/distributed/test_store.py::LibUvTCPStoreTest::test_queues, test/distributed/test_store.py::LibUvTCPStoreTest::test_queues_bidirectional, test/distributed/test_store.py::LibUvTCPStoreTest::test_queues_nonblocking, test/distributed/test_store.py::LibUvTCPStoreTest::test_queues_timeout, test/distributed/test_store.py::LibUvTCPStoreTest::test_repr, test/distributed/test_store.py::LibUvTCPStoreTest::test_set_get_check, test/distributed/test_store.py::LibUvTCPStoreTest::test_simple_wait, test/distributed/test_store.py::LibUvTCPStoreTest::test_store_timeout_on_missing_clients, test/distributed/test_store.py::LibUvTCPStoreTest::test_take_over_listen_socket, test/distributed/test_store.py::LibUvTCPStoreTest::test_world_size_0_raises, test/distributed/test_store.py::PrefixTCPStoreTest::test_append, test/distributed/test_store.py::PrefixTCPStoreTest::test_clone, test/distributed/test_store.py::PrefixTCPStoreTest::test_compare_set, test/distributed/test_store.py::PrefixTCPStoreTest::test_list_keys, test/distributed/test_store.py::PrefixTCPStoreTest::test_multi_get, test/distributed/test_store.py::PrefixTCPStoreTest::test_multi_set, test/distributed/test_store.py::PrefixTCPStoreTest::test_queues, test/distributed/test_store.py::PrefixTCPStoreTest::test_queues_bidirectional, test/distributed/test_store.py::PrefixTCPStoreTest::test_queues_nonblocking, test/distributed/test_store.py::PrefixTCPStoreTest::test_queues_timeout, test/distributed/test_store.py::PrefixTCPStoreTest::test_set_get_check, test/distributed/test_store.py::PrefixTCPStoreTest::test_simple_wait, test/distributed/test_store.py::PrefixTCPStoreTest::test_underlying_non_prefix_store, test/distributed/test_store.py::PythonStoreTest::test_set_get, test/distributed/test_store.py::RendezvousTest::test_unknown_handler, test/distributed/test_store.py::RendezvousTest::test_url_with_node_params, test/distributed/test_store.py::RendezvousEnvTest::test_nominal, test/distributed/test_store.py::RendezvousFileTest::test_common_errors, test/distributed/test_store.py::RendezvousFileTest::test_nominal, test/distributed/test_store.py::RendezvousTCPTest::test_common_errors, test/distributed/test_store.py::RendezvousTCPTest::test_dns_timeout, test/distributed/test_store.py::RendezvousTCPTest::test_nominal, test/distributed/test_store.py::RendezvousTCPTest::test_tcp_store_timeout_doest_break_client, test/distributed/test_store.py::RendezvousTCPTest::test_tcp_store_timeout_set, test/distributed/test_store.py::RendezvousTCPTest::test_tcp_store_url_with_libuv, test/distributed/test_store.py::TestPythonStore::test_append_roundtrip, test/distributed/test_store.py::TestPythonStore::test_extended_methods_fallbacks, test/distributed/test_store.py::TestPythonStore::test_has_extended_api_passthrough, test/distributed/test_store.py::TestPythonStore::test_has_extended_api_roundtrip, test/distributed/test_store.py::TestPythonStore::test_multi_get_roundtrip, test/distributed/test_store.py::TestPythonStore::test_multi_set_roundtrip, test/distributed/test_store.py::TestPythonStore::test_optional_methods_fail, test/distributed/test_store.py::TestMultiThreadedWait::test_wait_file_store, test/distributed/test_store.py::TestMultiThreadedWait::test_wait_hash_store, test/distributed/test_store.py::TestMultiThreadedWait::test_wait_prefix_file_store, test/distributed/test_store.py::TestMultiThreadedWait::test_wait_tcp_store, test/distributed/test_store.py::TestMultiThreadedWait::test_wait_tcp_store_uv, test/distributed/test_store.py::TimeoutTest::test_interrupt_doesnt_break_wait, test/distributed/test_store.py::InitPgWithNonUvStore::test_with_env_var, test/distributed/test_store.py::InitPgWithNonUvStore::test_with_url_param, test/distributed/test_store.py::TestClientProtocol::test_client_connect 2025-12-04T12:36:29.5305954Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_append 2025-12-04T12:36:29.5306733Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_clone 2025-12-04T12:36:29.5307502Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_compare_set 2025-12-04T12:36:29.5308395Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_init_pg_and_rpc_with_same_file 2025-12-04T12:36:29.5309270Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_list_keys 2025-12-04T12:36:29.5310143Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_multi_get 2025-12-04T12:36:29.5310870Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_multi_set 2025-12-04T12:36:29.5311599Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_queues 2025-12-04T12:36:29.5312426Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_queues_bidirectional 2025-12-04T12:36:29.5313259Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_queues_nonblocking 2025-12-04T12:36:29.5314051Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_queues_timeout 2025-12-04T12:36:29.5314811Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_refcount 2025-12-04T12:36:29.5315569Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_set_get_check 2025-12-04T12:36:29.5316327Z Running 1 items in this shard: test/distributed/test_store.py::FileStoreTest::test_simple_wait 2025-12-04T12:36:29.5317064Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_append 2025-12-04T12:36:29.5317816Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_clone 2025-12-04T12:36:29.5318546Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_compare_set 2025-12-04T12:36:29.5319283Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_list_keys 2025-12-04T12:36:29.5320026Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_multi_get 2025-12-04T12:36:29.5320764Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_multi_set 2025-12-04T12:36:29.5321487Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_queues 2025-12-04T12:36:29.5322244Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_queues_bidirectional 2025-12-04T12:36:29.5323076Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_queues_nonblocking 2025-12-04T12:36:29.5323878Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_queues_timeout 2025-12-04T12:36:29.5324686Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_set_get_check 2025-12-04T12:36:29.5325448Z Running 1 items in this shard: test/distributed/test_store.py::HashStoreTest::test_simple_wait 2025-12-04T12:36:29.5326277Z Running 1 items in this shard: test/distributed/test_store.py::PrefixStoreTest::test_get_underlying_store 2025-12-04T12:36:29.5327090Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_append 2025-12-04T12:36:29.5327861Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_clone 2025-12-04T12:36:29.5328639Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_compare_set 2025-12-04T12:36:29.5329491Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_list_keys 2025-12-04T12:36:29.5330287Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_multi_get 2025-12-04T12:36:29.5331086Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_multi_set 2025-12-04T12:36:29.5331859Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_queues 2025-12-04T12:36:29.5332691Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_queues_bidirectional 2025-12-04T12:36:29.5333628Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_queues_nonblocking 2025-12-04T12:36:29.5334727Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_queues_timeout 2025-12-04T12:36:29.5335650Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_set_get_check 2025-12-04T12:36:29.5336578Z Running 1 items in this shard: test/distributed/test_store.py::PrefixFileStoreTest::test_simple_wait 2025-12-04T12:36:29.5337511Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_address_already_in_use 2025-12-04T12:36:29.5338400Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_agent_store 2025-12-04T12:36:29.5339256Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_append 2025-12-04T12:36:29.5340038Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_clone 2025-12-04T12:36:29.5340851Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_compare_set 2025-12-04T12:36:29.5341787Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_init_pg_and_rpc_with_same_socket 2025-12-04T12:36:29.5342695Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_list_keys 2025-12-04T12:36:29.5343516Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_multi_get 2025-12-04T12:36:29.5344367Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_multi_set 2025-12-04T12:36:29.5345289Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_multi_worker_with_fixed_world_size 2025-12-04T12:36:29.5346393Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_multi_worker_with_nonfixed_world_size 2025-12-04T12:36:29.5347253Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_multitenancy 2025-12-04T12:36:29.5348025Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_numkeys_delkeys 2025-12-04T12:36:29.5348750Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_queues 2025-12-04T12:36:29.5349510Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_queues_bidirectional 2025-12-04T12:36:29.5350504Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_queues_nonblocking 2025-12-04T12:36:29.5351440Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_queues_timeout 2025-12-04T12:36:29.5352199Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_repr 2025-12-04T12:36:29.5352967Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_set_get_check 2025-12-04T12:36:29.5353801Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_simple_wait 2025-12-04T12:36:29.5354686Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_store_timeout_on_missing_clients 2025-12-04T12:36:29.5355600Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_take_over_listen_socket 2025-12-04T12:36:29.5356474Z Running 1 items in this shard: test/distributed/test_store.py::TCPStoreTest::test_world_size_0_raises 2025-12-04T12:36:29.5357410Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_address_already_in_use 2025-12-04T12:36:29.5358299Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_agent_store 2025-12-04T12:36:29.5359278Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_append 2025-12-04T12:36:29.5360098Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_clone 2025-12-04T12:36:29.5360938Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_compare_set 2025-12-04T12:36:29.5361891Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_init_pg_and_rpc_with_same_socket 2025-12-04T12:36:29.5362847Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_list_keys 2025-12-04T12:36:29.5363696Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_get 2025-12-04T12:36:29.5364539Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_set 2025-12-04T12:36:29.5365487Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_worker_with_fixed_world_size 2025-12-04T12:36:29.5366607Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_multi_worker_with_nonfixed_world_size 2025-12-04T12:36:29.5367603Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_multitenancy 2025-12-04T12:36:29.5368495Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_numkeys_delkeys 2025-12-04T12:36:29.5369346Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_queues 2025-12-04T12:36:29.5370231Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_queues_bidirectional 2025-12-04T12:36:29.5371266Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_queues_nonblocking 2025-12-04T12:36:29.5372176Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_queues_timeout 2025-12-04T12:36:29.5372985Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_repr 2025-12-04T12:36:29.5374046Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_set_get_check 2025-12-04T12:36:29.5374961Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_simple_wait 2025-12-04T12:36:29.5375952Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_store_timeout_on_missing_clients 2025-12-04T12:36:29.5376983Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_take_over_listen_socket 2025-12-04T12:36:29.5377961Z Running 1 items in this shard: test/distributed/test_store.py::LibUvTCPStoreTest::test_world_size_0_raises 2025-12-04T12:36:29.5379070Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_append 2025-12-04T12:36:29.5379926Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_clone 2025-12-04T12:36:29.5380811Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_compare_set 2025-12-04T12:36:29.5381720Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_list_keys 2025-12-04T12:36:29.5382671Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_multi_get 2025-12-04T12:36:29.5383540Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_multi_set 2025-12-04T12:36:29.5384414Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_queues 2025-12-04T12:36:29.5385339Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_queues_bidirectional 2025-12-04T12:36:29.5386363Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_queues_nonblocking 2025-12-04T12:36:29.5387301Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_queues_timeout 2025-12-04T12:36:29.5388238Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_set_get_check 2025-12-04T12:36:29.5389169Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_simple_wait 2025-12-04T12:36:29.5390158Z Running 1 items in this shard: test/distributed/test_store.py::PrefixTCPStoreTest::test_underlying_non_prefix_store 2025-12-04T12:36:29.5391251Z Running 1 items in this shard: test/distributed/test_store.py::PythonStoreTest::test_set_get 2025-12-04T12:36:29.5392027Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTest::test_unknown_handler 2025-12-04T12:36:29.5392851Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTest::test_url_with_node_params 2025-12-04T12:36:29.5393658Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousEnvTest::test_nominal 2025-12-04T12:36:29.5394444Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousFileTest::test_common_errors 2025-12-04T12:36:29.5395283Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousFileTest::test_nominal 2025-12-04T12:36:29.5396075Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTCPTest::test_common_errors 2025-12-04T12:36:29.5396868Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTCPTest::test_dns_timeout 2025-12-04T12:36:29.5397652Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTCPTest::test_nominal 2025-12-04T12:36:29.5398538Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTCPTest::test_tcp_store_timeout_doest_break_client 2025-12-04T12:36:29.5399489Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTCPTest::test_tcp_store_timeout_set 2025-12-04T12:36:29.5400362Z Running 1 items in this shard: test/distributed/test_store.py::RendezvousTCPTest::test_tcp_store_url_with_libuv 2025-12-04T12:36:29.5401266Z Running 1 items in this shard: test/distributed/test_store.py::TestPythonStore::test_append_roundtrip 2025-12-04T12:36:29.5402141Z Running 1 items in this shard: test/distributed/test_store.py::TestPythonStore::test_extended_methods_fallbacks 2025-12-04T12:36:29.5403056Z Running 1 items in this shard: test/distributed/test_store.py::TestPythonStore::test_has_extended_api_passthrough 2025-12-04T12:36:29.5403951Z Running 1 items in this shard: test/distributed/test_store.py::TestPythonStore::test_has_extended_api_roundtrip 2025-12-04T12:36:29.5404814Z Running 1 items in this shard: test/distributed/test_store.py::TestPythonStore::test_multi_get_roundtrip 2025-12-04T12:36:29.5405647Z Running 1 items in this shard: test/distributed/test_store.py::TestPythonStore::test_multi_set_roundtrip 2025-12-04T12:36:29.5406496Z Running 1 items in this shard: test/distributed/test_store.py::TestPythonStore::test_optional_methods_fail 2025-12-04T12:36:29.5407334Z Running 1 items in this shard: test/distributed/test_store.py::TestMultiThreadedWait::test_wait_file_store 2025-12-04T12:36:29.5408186Z Running 1 items in this shard: test/distributed/test_store.py::TestMultiThreadedWait::test_wait_hash_store 2025-12-04T12:36:29.5409099Z Running 1 items in this shard: test/distributed/test_store.py::TestMultiThreadedWait::test_wait_prefix_file_store 2025-12-04T12:36:29.5409980Z Running 1 items in this shard: test/distributed/test_store.py::TestMultiThreadedWait::test_wait_tcp_store 2025-12-04T12:36:29.5410827Z Running 1 items in this shard: test/distributed/test_store.py::TestMultiThreadedWait::test_wait_tcp_store_uv 2025-12-04T12:36:29.5411695Z Running 1 items in this shard: test/distributed/test_store.py::TimeoutTest::test_interrupt_doesnt_break_wait 2025-12-04T12:36:29.5412572Z Running 1 items in this shard: test/distributed/test_store.py::InitPgWithNonUvStore::test_with_env_var 2025-12-04T12:36:29.5413404Z Running 1 items in this shard: test/distributed/test_store.py::InitPgWithNonUvStore::test_with_url_param 2025-12-04T12:36:29.5414520Z Running 1 items in this shard: test/distributed/test_store.py::TestClientProtocol::test_client_connect 2025-12-04T12:36:29.5415093Z 2025-12-04T12:36:29.5415427Z Finished distributed/test_store 1/1 ... [2025-12-04 12:36:29.523131][13415.625258828], took 8.45min 2025-12-04T12:36:29.5789734Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4402e4ca07679d5e.xml 2025-12-04T12:36:29.6591843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-59586d5fa8d9df00.xml 2025-12-04T12:36:29.6876667Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-904d5bf6ccd1c7aa.xml 2025-12-04T12:36:29.7160736Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c074807310bb3c83.xml 2025-12-04T12:36:29.7446690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8f207b69f7a673c5.xml 2025-12-04T12:36:29.7725087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cb7ce4d5a847e19b.xml 2025-12-04T12:36:29.8007886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-52257f6acad204d5.xml 2025-12-04T12:36:29.8296613Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b16442ffcab7dd38.xml 2025-12-04T12:36:29.8617559Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a12f788b611f7140.xml 2025-12-04T12:36:29.8949046Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-dfaa32f264045766.xml 2025-12-04T12:36:29.9256719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8da17e1f6a078343.xml 2025-12-04T12:36:29.9648451Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-e8525e9dd27a79c3.xml 2025-12-04T12:36:29.9970389Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-6f41a4ce4013a7c2.xml 2025-12-04T12:36:30.0244818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0470d4ac72d7a50e.xml 2025-12-04T12:36:30.0609981Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3a36dde009a45dc5.xml 2025-12-04T12:36:30.0889327Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-df81badd9797f785.xml 2025-12-04T12:36:30.1195699Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-350d6caf6618d2b5.xml 2025-12-04T12:36:30.1496118Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9a9af3abc1b0b41d.xml 2025-12-04T12:36:30.1777857Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-619ffbde41a10c5d.xml 2025-12-04T12:36:30.2193138Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c5cf03e47a405c4b.xml 2025-12-04T12:36:30.2503678Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-af6e2b8d803b9c4f.xml 2025-12-04T12:36:30.2832027Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97c2051ccee23a9b.xml 2025-12-04T12:36:30.3121669Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-482d04b678af0ece.xml 2025-12-04T12:36:30.3562804Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-10499bf9f759075b.xml 2025-12-04T12:36:30.3871452Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cccae4cdf350788c.xml 2025-12-04T12:36:30.4211134Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9241ea60ff5c054f.xml 2025-12-04T12:36:30.4509649Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c77cdf590d6c4d53.xml 2025-12-04T12:36:30.4966299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9d86817452e27e08.xml 2025-12-04T12:36:30.5290485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4b2a51af148732c1.xml 2025-12-04T12:36:30.5936255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-36cc6892ce1d13b2.xml 2025-12-04T12:36:30.6210673Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f3e053be23766a80.xml 2025-12-04T12:36:30.6499048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c4831c850757caf9.xml 2025-12-04T12:36:30.6816854Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-15d93d0dff93e000.xml 2025-12-04T12:36:30.7158185Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97c506133bdf82e2.xml 2025-12-04T12:36:30.7449842Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-760dbf0b3a7076aa.xml 2025-12-04T12:36:30.7739394Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0c82075f9025f767.xml 2025-12-04T12:36:30.8033274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3e48901c9a873f45.xml 2025-12-04T12:36:30.8296691Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7a3e24041d2ef943.xml 2025-12-04T12:36:30.8589884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-04f97f132e861e46.xml 2025-12-04T12:36:30.8881428Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-de7756f9c1641da6.xml 2025-12-04T12:36:30.9179896Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1bca8b988eb41bac.xml 2025-12-04T12:36:30.9428502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-286102c30fda404f.xml 2025-12-04T12:36:30.9731280Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-bedd718db983bac0.xml 2025-12-04T12:36:31.0011092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-eefe267c1e87f355.xml 2025-12-04T12:36:31.0331366Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-75d254d8d2b940a0.xml 2025-12-04T12:36:31.0656647Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d29ba7e8ccb2ecb1.xml 2025-12-04T12:36:31.0929742Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f94b0a13c28491ca.xml 2025-12-04T12:36:31.1216795Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d359118932c9b995.xml 2025-12-04T12:36:31.1515482Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f756c8323a1d09e8.xml 2025-12-04T12:36:31.1829508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1240aa89fcaf1417.xml 2025-12-04T12:36:31.2167764Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d2b8c5d98b2db0d3.xml 2025-12-04T12:36:31.2510557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f04571bca3f6577a.xml 2025-12-04T12:36:31.2812489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-62a8c36072d028e3.xml 2025-12-04T12:36:31.3098544Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-085d0338122bdd88.xml 2025-12-04T12:36:31.3353918Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3aff210f96c86539.xml 2025-12-04T12:36:31.3669580Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-50126680f72685fd.xml 2025-12-04T12:36:31.3962862Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-265b5fcb7c5f4add.xml 2025-12-04T12:36:31.4227687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-658bdaf47e5d9fc0.xml 2025-12-04T12:36:31.4516183Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9359cc0c923fd357.xml 2025-12-04T12:36:31.4816788Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8f02a426ff307186.xml 2025-12-04T12:36:31.5256640Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-724b43aaa3e86430.xml 2025-12-04T12:36:31.5609236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-849142fee7d3fe7a.xml 2025-12-04T12:36:31.5915321Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-40597616ec98a508.xml 2025-12-04T12:36:31.6201250Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8de5caa0f44ab195.xml 2025-12-04T12:36:31.6513740Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-70a2189ada91e7b4.xml 2025-12-04T12:36:31.6809749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c9872634fae2a2a2.xml 2025-12-04T12:36:31.7130659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a2053711ae870746.xml 2025-12-04T12:36:31.7735633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4ef2fc9fec34c264.xml 2025-12-04T12:36:31.8040837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b4a7a6fe6b411ab3.xml 2025-12-04T12:36:31.8331493Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3a29202ead173617.xml 2025-12-04T12:36:31.8650170Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a61155d6f938b2cc.xml 2025-12-04T12:36:31.8968369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9b41f59ee0cfeb75.xml 2025-12-04T12:36:31.9275440Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-307094c901db62b6.xml 2025-12-04T12:36:31.9697351Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c3332bb3687882a6.xml 2025-12-04T12:36:32.0033403Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-422a006275d6f6d2.xml 2025-12-04T12:36:32.0339347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b842f6182997ffac.xml 2025-12-04T12:36:32.0638782Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b78401240c2392a0.xml 2025-12-04T12:36:32.0929667Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-05d0a7320ac2f2e5.xml 2025-12-04T12:36:32.1207646Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-56eae3d52dfef9e0.xml 2025-12-04T12:36:32.1512460Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9598d85b65a1dd25.xml 2025-12-04T12:36:32.1806559Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9433dc4d80f4e3fb.xml 2025-12-04T12:36:32.2073193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-fbcb32de5a4aaa3b.xml 2025-12-04T12:36:32.2359209Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c2ec29ec8ed5fa00.xml 2025-12-04T12:36:32.2692078Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7fbbe4d1eb982186.xml 2025-12-04T12:36:32.2990256Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-fd140e21219ecfa7.xml 2025-12-04T12:36:32.3289438Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-db87db829f00dbc2.xml 2025-12-04T12:36:32.3595693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0645df0da2606eed.xml 2025-12-04T12:36:32.3894821Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-5be93906577b570a.xml 2025-12-04T12:36:32.4216656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cb158e2a6356a16b.xml 2025-12-04T12:36:32.4516966Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1efba08034268a13.xml 2025-12-04T12:36:32.4828690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1cfd040a8029b228.xml 2025-12-04T12:36:32.5125676Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-26b3b1f841eed644.xml 2025-12-04T12:36:32.5432194Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8363e71100168a00.xml 2025-12-04T12:36:32.5731923Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-449df693536afa26.xml 2025-12-04T12:36:32.5992849Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a65d4b001bf04cc5.xml 2025-12-04T12:36:32.6281223Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a841b70017fda049.xml 2025-12-04T12:36:32.6559961Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4cd5969ebb7a8971.xml 2025-12-04T12:36:32.6857324Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8be9136d4926e94c.xml 2025-12-04T12:36:32.7142311Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-804b0113d0e9f4eb.xml 2025-12-04T12:36:32.7431762Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-03bdd58db5584705.xml 2025-12-04T12:36:32.7744385Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ebfe437a9c6a9ad8.xml 2025-12-04T12:36:32.8642066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a1970e4a95b6fcaa.xml 2025-12-04T12:36:32.8947844Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-e7b46a0191ac24db.xml 2025-12-04T12:36:32.9248439Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7c2efcfcaf566fcf.xml 2025-12-04T12:36:32.9576552Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-98f57491a6bbb62c.xml 2025-12-04T12:36:32.9900232Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4dd659a2a136b6f5.xml 2025-12-04T12:36:33.0217839Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ed9bd2c70267a528.xml 2025-12-04T12:36:33.0971748Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-2244d9322e930674.xml 2025-12-04T12:36:33.1298574Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c9423adcd896df81.xml 2025-12-04T12:36:33.1663170Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-627f0929637f665a.xml 2025-12-04T12:36:33.2007817Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c04d647e93aaefae.xml 2025-12-04T12:36:33.2309597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-91ec2ebad1950a1b.xml 2025-12-04T12:36:33.2618899Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-70b9e0e0ebbe8c78.xml 2025-12-04T12:36:33.2912268Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-64a3329e5489e4a1.xml 2025-12-04T12:36:33.3201777Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-286648de75ab2791.xml 2025-12-04T12:36:33.3531015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-aba6b9e907975cf2.xml 2025-12-04T12:36:33.3842903Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9858e278543d5c8a.xml 2025-12-04T12:36:33.4159650Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7b426550a7173d4d.xml 2025-12-04T12:36:33.4468064Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-687616f36cc07fc7.xml 2025-12-04T12:36:33.4796035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97db8e5668905dba.xml 2025-12-04T12:36:33.5081806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ad88807c8d637ecf.xml 2025-12-04T12:36:33.5456771Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b4bc3ece8958b620.xml 2025-12-04T12:36:33.5820362Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-dad34641aa2ec6a8.xml 2025-12-04T12:36:33.6161968Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4581fd35a1a9c062.xml 2025-12-04T12:36:33.6482094Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3bb2e8b2b3b8f504.xml 2025-12-04T12:36:33.6762905Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_store/distributed.test_store-21612ac6bc46612d.xml 2025-12-04T12:36:33.7614779Z Running distributed/test_c10d_nccl 1/3 ... [2025-12-04 12:36:33.760903][13419.863033446] 2025-12-04T12:36:33.7615521Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:36:33.7616835Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_nccl.py', '--shard-id=1', '--num-shards=3', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:33.761273] 2025-12-04T12:51:08.4677699Z 2025-12-04T12:51:08.4679126Z distributed/test_c10d_nccl 1/3 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_nccl_1.3_dda30713f52ab06d_.log 2025-12-04T12:51:08.4727841Z Running 91 items in this shard: test/distributed/test_c10d_nccl.py::RendezvousEnvTest::test_common_errors, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLNoGPUTest::test_init_no_gpus, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_multi_pgs, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_pg, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_block_current_stream, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_close_pg_eager_init_False, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_close_pg_eager_init_True, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extend_nccl_pg_timeout_backend_nccl, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_file_store_check, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_init_process_group_nccl_timeout, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_init_with_idx, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_bfloat16, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float8_e5m2, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_new_group_eager_init_True, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_init, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_backend_properties, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_multiple_comms, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_multiple_exclusions, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_validation, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_accumulate_gradients_module, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_bf16_compress_wrapper_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_with_then_hook_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_mixed_real_and_complex_params, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_with_lazy_parameters, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_default_ddp_comm_hooks_nccl_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_debug_info, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_debug_off, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_detail, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_off, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_grad_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_grad_layout_1devicemodule_1replicaperprocess, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_2gpu_module, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_4gpu_module, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_single_device_module_device_ids_None, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_single_device_module_empty_device_ids, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_propagate_error_reason, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_no_grad, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_param_layout_mismatch_error, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_powerSGD_ddp_comm_hook_nccl, test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_all_gather_object, test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_broadcast, test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_seq, test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_with_ddp, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_error_detection_and_propagation, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_invalid_nccl_blocking_wait_env, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_non_blocking_wait_with_barrier, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_restart_pg_after_error, test/distributed/test_c10d_nccl.py::CommTest::test_pass_nccl_options_config, test/distributed/test_c10d_nccl.py::CommTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_incremented_nccl_default, test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_incremented_nccl_subgroup, test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_nccl_new_group, test/distributed/test_c10d_nccl.py::CommTest::test_wait_tensor, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_base, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e5m2, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend, test/distributed/test_c10d_nccl.py::LargeCommTest::test_batch_send_recv_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device0_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_object_list_subgroup_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_object_list_subgroup_set_device0_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_object_list_subgroup_set_device1_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_True_async_op_False, test/distributed/test_c10d_nccl.py::SparseCollective::test_ddp_set_sparse_metadata, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_dump_pipe, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_multiple_resets_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_circular_buffer_full_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_partial_overwrite_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_partial_overwrite_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_long, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_pickle_timing_enabled_False_include_collectives_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_pickle_timing_enabled_True_include_collectives_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_pickle_timing_enabled_True_include_collectives_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_True_only_active_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_all_works_retired, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLLargerScaleTest::test_comm_split_group_larger_scale 2025-12-04T12:51:08.4771949Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::RendezvousEnvTest::test_common_errors 2025-12-04T12:51:08.4772909Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLNoGPUTest::test_init_no_gpus 2025-12-04T12:51:08.4774239Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_multi_pgs 2025-12-04T12:51:08.4775298Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_pg 2025-12-04T12:51:08.4776334Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_block_current_stream 2025-12-04T12:51:08.4777499Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_close_pg_eager_init_False 2025-12-04T12:51:08.4778847Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_close_pg_eager_init_True 2025-12-04T12:51:08.4779951Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group 2025-12-04T12:51:08.4781116Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extend_nccl_pg_timeout_backend_nccl 2025-12-04T12:51:08.4782277Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_file_store_check 2025-12-04T12:51:08.4783422Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_init_process_group_nccl_timeout 2025-12-04T12:51:08.4784555Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_init_with_idx 2025-12-04T12:51:08.4785602Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_bfloat16 2025-12-04T12:51:08.4786708Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float8_e5m2 2025-12-04T12:51:08.4787953Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_new_group_eager_init_True 2025-12-04T12:51:08.4789062Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_init 2025-12-04T12:51:08.4790195Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_backend_properties 2025-12-04T12:51:08.4791468Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_multiple_comms 2025-12-04T12:51:08.4792678Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_multiple_exclusions 2025-12-04T12:51:08.4793823Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_validation 2025-12-04T12:51:08.4794937Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_accumulate_gradients_module 2025-12-04T12:51:08.4796114Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_bf16_compress_wrapper_is_view 2025-12-04T12:51:08.4797228Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output 2025-12-04T12:51:08.4798395Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing 2025-12-04T12:51:08.4799795Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False 2025-12-04T12:51:08.4801116Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_False 2025-12-04T12:51:08.4802475Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_with_then_hook_nccl 2025-12-04T12:51:08.4803633Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params 2025-12-04T12:51:08.4804746Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_mixed_real_and_complex_params 2025-12-04T12:51:08.4805879Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_with_lazy_parameters 2025-12-04T12:51:08.4807035Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_default_ddp_comm_hooks_nccl_is_view 2025-12-04T12:51:08.4808301Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_debug_info 2025-12-04T12:51:08.4809792Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_debug_off 2025-12-04T12:51:08.4811088Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_detail 2025-12-04T12:51:08.4812453Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_off 2025-12-04T12:51:08.4813829Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16 2025-12-04T12:51:08.4814916Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_nccl 2025-12-04T12:51:08.4816043Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_grad_is_view 2025-12-04T12:51:08.4817274Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_grad_layout_1devicemodule_1replicaperprocess 2025-12-04T12:51:08.4818617Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_2gpu_module 2025-12-04T12:51:08.4819780Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_4gpu_module 2025-12-04T12:51:08.4821053Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_single_device_module_device_ids_None 2025-12-04T12:51:08.4822467Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_single_device_module_empty_device_ids 2025-12-04T12:51:08.4823813Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_propagate_error_reason 2025-12-04T12:51:08.4824901Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_no_grad 2025-12-04T12:51:08.4826086Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_param_layout_mismatch_error 2025-12-04T12:51:08.4827211Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_powerSGD_ddp_comm_hook_nccl 2025-12-04T12:51:08.4828285Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_all_gather_object 2025-12-04T12:51:08.4829284Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_broadcast 2025-12-04T12:51:08.4830207Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_seq 2025-12-04T12:51:08.4831248Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::WorkHookTest::test_on_completion_hook_with_ddp 2025-12-04T12:51:08.4832221Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_error_detection_and_propagation 2025-12-04T12:51:08.4833281Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_invalid_nccl_blocking_wait_env 2025-12-04T12:51:08.4834314Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_non_blocking_wait_with_barrier 2025-12-04T12:51:08.4835316Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_restart_pg_after_error 2025-12-04T12:51:08.4836213Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_pass_nccl_options_config 2025-12-04T12:51:08.4837090Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_reduce_scatter_tensor_coalesced 2025-12-04T12:51:08.4838025Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_incremented_nccl_default 2025-12-04T12:51:08.4838982Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_incremented_nccl_subgroup 2025-12-04T12:51:08.4839909Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_nccl_new_group 2025-12-04T12:51:08.4840733Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_wait_tensor 2025-12-04T12:51:08.4841684Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_base 2025-12-04T12:51:08.4842918Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e5m2 2025-12-04T12:51:08.4844265Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend 2025-12-04T12:51:08.4845468Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_batch_send_recv_subgroup_group_rank_True 2025-12-04T12:51:08.4846564Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device0_group_rank_False 2025-12-04T12:51:08.4847777Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_True 2025-12-04T12:51:08.4848837Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_subgroup_group_rank_True 2025-12-04T12:51:08.4849782Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_False 2025-12-04T12:51:08.4850816Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_object_list_subgroup_group_rank_False 2025-12-04T12:51:08.4851821Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_True 2025-12-04T12:51:08.4852881Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_object_list_subgroup_set_device0_group_rank_False 2025-12-04T12:51:08.4854380Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_object_list_subgroup_set_device1_group_rank_False 2025-12-04T12:51:08.4855667Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_False 2025-12-04T12:51:08.4856894Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_True_async_op_False 2025-12-04T12:51:08.4858029Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::SparseCollective::test_ddp_set_sparse_metadata 2025-12-04T12:51:08.4859103Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_False 2025-12-04T12:51:08.4860102Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_dump_pipe 2025-12-04T12:51:08.4861165Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_multiple_resets_timing_enabled_False 2025-12-04T12:51:08.4862424Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_circular_buffer_full_timing_enabled_False 2025-12-04T12:51:08.4863711Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_partial_overwrite_timing_enabled_False 2025-12-04T12:51:08.4864978Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_partial_overwrite_timing_enabled_True 2025-12-04T12:51:08.4866289Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_True 2025-12-04T12:51:08.4867232Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_long 2025-12-04T12:51:08.4868193Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_pickle_timing_enabled_False_include_collectives_False 2025-12-04T12:51:08.4869375Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_pickle_timing_enabled_True_include_collectives_False 2025-12-04T12:51:08.4870524Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_pickle_timing_enabled_True_include_collectives_True 2025-12-04T12:51:08.4871684Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_False 2025-12-04T12:51:08.4872839Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_True_only_active_True 2025-12-04T12:51:08.4873875Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_all_works_retired 2025-12-04T12:51:08.4874894Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLLargerScaleTest::test_comm_split_group_larger_scale 2025-12-04T12:51:08.4875531Z 2025-12-04T12:51:08.4875871Z Finished distributed/test_c10d_nccl 1/3 ... [2025-12-04 12:51:08.469560][14294.571687817], took 14.58min 2025-12-04T12:51:08.5412250Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-424c7796b1d4da37.xml 2025-12-04T12:51:08.6138596Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7bb4f0d3928e2ed2.xml 2025-12-04T12:51:08.6435649Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-33318bd0fe5ba50d.xml 2025-12-04T12:51:08.6720491Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-40835271667093d0.xml 2025-12-04T12:51:08.6967376Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c5cf88214d708090.xml 2025-12-04T12:51:08.7259969Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fe7d871fae3e5a4d.xml 2025-12-04T12:51:08.7538543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2f486ca4ce8f2e5.xml 2025-12-04T12:51:08.7806092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2514e248fb19a6f7.xml 2025-12-04T12:51:08.8090324Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5fddd299999d2001.xml 2025-12-04T12:51:08.8355039Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-526fccf11a728f1b.xml 2025-12-04T12:51:08.8656846Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ae02a87d7dee25c6.xml 2025-12-04T12:51:08.8933645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-92c41cc069cc4d37.xml 2025-12-04T12:51:08.9248245Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8ae44307030d30dc.xml 2025-12-04T12:51:08.9536633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9417cbc175c56634.xml 2025-12-04T12:51:08.9811681Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-14e447da1762fe9d.xml 2025-12-04T12:51:09.0081880Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f1da8f898c8da1fd.xml 2025-12-04T12:51:09.0371402Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-60a2acbc27c8df8e.xml 2025-12-04T12:51:09.0616558Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9cce25ecd2936a9e.xml 2025-12-04T12:51:09.0871145Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7981bf6fc2476012.xml 2025-12-04T12:51:09.1156874Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-68308ed30b7c8249.xml 2025-12-04T12:51:09.1450584Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7d82f8c0c3f71993.xml 2025-12-04T12:51:09.1736620Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bc89766ea636312a.xml 2025-12-04T12:51:09.2016684Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-14132ee63704b359.xml 2025-12-04T12:51:09.2309326Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d030ce3f22be0dac.xml 2025-12-04T12:51:09.2594424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-08e08a8b7a3c9688.xml 2025-12-04T12:51:09.2870592Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-04413b57dd0bd1bb.xml 2025-12-04T12:51:09.3192239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ede7c35169d447e5.xml 2025-12-04T12:51:09.3509219Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-03d41d77810a93c2.xml 2025-12-04T12:51:09.3777619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e5d454fc87664e79.xml 2025-12-04T12:51:09.4096661Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9b5d1be0cdc61898.xml 2025-12-04T12:51:09.4411186Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cb65338c3c68b015.xml 2025-12-04T12:51:09.4869807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ea9d6160bdcb14ea.xml 2025-12-04T12:51:09.5137626Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54b6e103bc500b82.xml 2025-12-04T12:51:09.5416881Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-43cbd90ab1e74433.xml 2025-12-04T12:51:09.5881245Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e1a5076677cc9040.xml 2025-12-04T12:51:09.6320166Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5b874423f18e9e6f.xml 2025-12-04T12:51:09.6592079Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-42a99cc8097c27cf.xml 2025-12-04T12:51:09.6873625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff575ff252e1ccc1.xml 2025-12-04T12:51:09.7190092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8f0aac570bbe3c22.xml 2025-12-04T12:51:09.7669537Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e46e2d43687e8c9.xml 2025-12-04T12:51:09.7950015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8983151d5ee422e9.xml 2025-12-04T12:51:09.8591105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-21007923fa28eb94.xml 2025-12-04T12:51:09.8878307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6d199e27a3f20ba.xml 2025-12-04T12:51:09.9171280Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-333908f1f4e63432.xml 2025-12-04T12:51:09.9455985Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-10f6223b6a5799e8.xml 2025-12-04T12:51:09.9748389Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0488ec6f0d084de4.xml 2025-12-04T12:51:10.0069734Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d0fe3ffcefb63fed.xml 2025-12-04T12:51:10.0390867Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-aede6c2e8b0a576d.xml 2025-12-04T12:51:10.0710855Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a402d736ec805725.xml 2025-12-04T12:51:10.0976782Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3d885be53a1fd8f7.xml 2025-12-04T12:51:10.1291116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e68bb841a642474c.xml 2025-12-04T12:51:10.1540274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f1f78f076f606689.xml 2025-12-04T12:51:10.1832521Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-31755bde92c246ad.xml 2025-12-04T12:51:10.2110828Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-954c6b44d604c3c5.xml 2025-12-04T12:51:10.2429692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-edecd7ac68a78e0c.xml 2025-12-04T12:51:10.2759702Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1b2e7a904a2bde2c.xml 2025-12-04T12:51:10.3097172Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8e8e63e24b22fdf3.xml 2025-12-04T12:51:10.3444841Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cac81d05bb378553.xml 2025-12-04T12:51:10.3706009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-44879a601c328a3c.xml 2025-12-04T12:51:10.4017054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fbb8b0d505262428.xml 2025-12-04T12:51:10.4323782Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0417ada3fc299914.xml 2025-12-04T12:51:10.4616049Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6ec92773f760b446.xml 2025-12-04T12:51:10.4933493Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-27748f3e1c06d895.xml 2025-12-04T12:51:10.5217559Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5b8215da1349e52d.xml 2025-12-04T12:51:10.5524121Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-97e205e1ddd96df1.xml 2025-12-04T12:51:10.5826210Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8ef86c99e3cf3225.xml 2025-12-04T12:51:10.6082005Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6a001d92e92403d5.xml 2025-12-04T12:51:10.6370043Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1548f33eb44681e7.xml 2025-12-04T12:51:10.6681504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-85e41dba7cec0336.xml 2025-12-04T12:51:10.6959491Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c9aa6140346d940e.xml 2025-12-04T12:51:10.7291981Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-677b6de586c48a10.xml 2025-12-04T12:51:10.7592590Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-129a903d843c6506.xml 2025-12-04T12:51:10.7891305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a8e0d0b35114e966.xml 2025-12-04T12:51:10.8178157Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-00b8787048f81655.xml 2025-12-04T12:51:10.8490002Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9749a48df2369fa2.xml 2025-12-04T12:51:10.8797651Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7b61630e604191fa.xml 2025-12-04T12:51:10.9292329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-babd79b2628f20c6.xml 2025-12-04T12:51:10.9634708Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c321a873e0094d80.xml 2025-12-04T12:51:10.9948746Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9d5bf7025c6dd7a3.xml 2025-12-04T12:51:11.0216454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6fdafbad56912bb4.xml 2025-12-04T12:51:11.0486837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fac34b62cd6cb996.xml 2025-12-04T12:51:11.0799846Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-18c9f847c7541135.xml 2025-12-04T12:51:11.1079348Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-aa0ccd92b1be664c.xml 2025-12-04T12:51:11.1344920Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-42a2588e5ba30eef.xml 2025-12-04T12:51:11.2438465Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6de735cb6d76f6db.xml 2025-12-04T12:51:11.2881985Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b9c0c81401e31d8c.xml 2025-12-04T12:51:11.3212619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-554a092032e2d568.xml 2025-12-04T12:51:11.3522345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1bdd7c0fac046180.xml 2025-12-04T12:51:11.3838341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b30b289e682ac666.xml 2025-12-04T12:51:11.4120871Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-db0ee76b53b2ddf3.xml 2025-12-04T12:51:11.4428193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4fbe3fc8cf0cb3d0.xml 2025-12-04T12:51:12.5441293Z Uploading artifacts took 1.03 seconds 2025-12-04T12:51:12.5444536Z Running distributed/elastic/events/lib_test 1/1 ... [2025-12-04 12:51:12.544182][14298.646312064] 2025-12-04T12:51:12.5445407Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:51:12.5448082Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/events/lib_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:51:12.544594] 2025-12-04T12:51:16.3690818Z 2025-12-04T12:51:16.3692020Z distributed/elastic/events/lib_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.events.lib_test_1.1_7071ab3e44d7ad6e_.log 2025-12-04T12:51:16.3697323Z Running 8 items in this shard: test/distributed/elastic/events/lib_test.py::EventLibTest::test_event_created, test/distributed/elastic/events/lib_test.py::EventLibTest::test_event_deser, test/distributed/elastic/events/lib_test.py::EventLibTest::test_get_or_create_logger, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event_does_not_run_if_invalid_dest, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_created, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_deserialize, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_str 2025-12-04T12:51:16.3701255Z 2025-12-04T12:51:16.3701626Z Finished distributed/elastic/events/lib_test 1/1 ... [2025-12-04 12:51:16.368523][14302.4706534], took 0.06min 2025-12-04T12:51:16.4489869Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.events.lib_test/distributed.elastic.events.lib_test-07a790705f8742f5.xml 2025-12-04T12:51:16.5246814Z Running distributed/elastic/metrics/api_test 1/1 ... [2025-12-04 12:51:16.524068][14302.626199633] 2025-12-04T12:51:16.5247471Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:51:16.5248906Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/metrics/api_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:51:16.524450] 2025-12-04T12:51:20.3490299Z 2025-12-04T12:51:20.3491523Z distributed/elastic/metrics/api_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.metrics.api_test_1.1_6dbe93286f341ee5_.log 2025-12-04T12:51:20.3494124Z Running 3 items in this shard: test/distributed/elastic/metrics/api_test.py::MetricsApiTest::test_get_metric_name, test/distributed/elastic/metrics/api_test.py::MetricsApiTest::test_inheritance, test/distributed/elastic/metrics/api_test.py::MetricsApiTest::test_profile 2025-12-04T12:51:20.3495938Z 2025-12-04T12:51:20.3496373Z Finished distributed/elastic/metrics/api_test 1/1 ... [2025-12-04 12:51:20.348520][14306.450634137], took 0.06min 2025-12-04T12:51:20.4296386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.metrics.api_test/distributed.elastic.metrics.api_test-089696776a609d56.xml 2025-12-04T12:51:20.5033130Z Running distributed/elastic/timer/api_test 1/1 ... [2025-12-04 12:51:20.502711][14306.604841985] 2025-12-04T12:51:20.5033780Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:51:20.5035050Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/timer/api_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:51:20.503093] 2025-12-04T12:51:22.2082754Z 2025-12-04T12:51:22.2083938Z distributed/elastic/timer/api_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.timer.api_test_1.1_13380cf2203031af_.log 2025-12-04T12:51:22.2084904Z 2025-12-04T12:51:22.2085598Z Finished distributed/elastic/timer/api_test 1/1 ... [2025-12-04 12:51:22.208078][14308.310207142], took 0.03min 2025-12-04T12:51:22.3338422Z Running distributed/elastic/timer/local_timer_example 1/1 ... [2025-12-04 12:51:22.333590][14308.435721198] 2025-12-04T12:51:22.3339133Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:51:22.3341950Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/timer/local_timer_example.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:51:22.333983] 2025-12-04T12:51:40.9961851Z 2025-12-04T12:51:40.9963373Z distributed/elastic/timer/local_timer_example 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.timer.local_timer_example_1.1_4b68dab99362c847_.log 2025-12-04T12:51:40.9965528Z Running 2 items in this shard: test/distributed/elastic/timer/local_timer_example.py::LocalTimerExample::test_example_start_method_spawn, test/distributed/elastic/timer/local_timer_example.py::LocalTimerExample::test_torch_mp_example 2025-12-04T12:51:40.9966744Z 2025-12-04T12:51:40.9967223Z Finished distributed/elastic/timer/local_timer_example 1/1 ... [2025-12-04 12:51:40.995776][14327.09790208], took 0.31min 2025-12-04T12:51:41.0678522Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.timer.local_timer_example/distributed.elastic.timer.local_timer_example-2bef7f019a87a08a.xml 2025-12-04T12:51:41.2061842Z Running distributed/elastic/timer/local_timer_test 1/1 ... [2025-12-04 12:51:41.205956][14327.308086995] 2025-12-04T12:51:41.2062526Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:51:41.2065174Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/timer/local_timer_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:51:41.206305] 2025-12-04T12:51:50.1446873Z 2025-12-04T12:51:50.1448467Z distributed/elastic/timer/local_timer_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.timer.local_timer_test_1.1_d1c67eef8711d433_.log 2025-12-04T12:51:50.1456803Z Running 14 items in this shard: test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_client_interaction, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_exception_propagation, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_get_timer_recursive, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_happy_path, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_no_client, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_timer, test/distributed/elastic/timer/local_timer_test.py::MultiprocessingRequestQueueTest::test_get, test/distributed/elastic/timer/local_timer_test.py::MultiprocessingRequestQueueTest::test_get_less_than_size, test/distributed/elastic/timer/local_timer_test.py::MultiprocessingRequestQueueTest::test_get_size, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_acquire_release, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_expired_timers, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_valid_timers, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_watchdog_call_count, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_watchdog_empty_queue 2025-12-04T12:51:50.1464252Z 2025-12-04T12:51:50.1464679Z Finished distributed/elastic/timer/local_timer_test 1/1 ... [2025-12-04 12:51:50.144061][14336.246191173], took 0.15min 2025-12-04T12:51:50.2162104Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.timer.local_timer_test/distributed.elastic.timer.local_timer_test-7292bedd4140d1cb.xml 2025-12-04T12:51:50.3316160Z Running distributed/elastic/utils/distributed_test 1/1 ... [2025-12-04 12:51:50.331066][14336.433198114] 2025-12-04T12:51:50.3316846Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:51:50.3318175Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/utils/distributed_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:51:50.331448] 2025-12-04T12:52:00.7762850Z 2025-12-04T12:52:00.7764307Z distributed/elastic/utils/distributed_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.utils.distributed_test_1.1_2f07a4b12f9c1ea3_.log 2025-12-04T12:52:00.7770087Z Running 8 items in this shard: test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_multi, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_no_port_multi, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_single_server, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_timeout_on_server, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_timeout_on_worker, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_with_libuv_support, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_port_already_in_use_on_server, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_port_already_in_use_on_worker 2025-12-04T12:52:00.7774600Z 2025-12-04T12:52:00.7775073Z Finished distributed/elastic/utils/distributed_test 1/1 ... [2025-12-04 12:52:00.775966][14346.87809509], took 0.17min 2025-12-04T12:52:00.8474543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.utils.distributed_test/distributed.elastic.utils.distributed_test-37b4dd92e3796470.xml 2025-12-04T12:52:00.9812742Z Running distributed/elastic/utils/logging_test 1/1 ... [2025-12-04 12:52:00.980658][14347.082790762] 2025-12-04T12:52:00.9813543Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:52:00.9815031Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/utils/logging_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:52:00.981016] 2025-12-04T12:52:04.8050313Z 2025-12-04T12:52:04.8051540Z distributed/elastic/utils/logging_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.utils.logging_test_1.1_aa6947c2b0a1b352_.log 2025-12-04T12:52:04.8053548Z Running 2 items in this shard: test/distributed/elastic/utils/logging_test.py::LoggingTest::test_derive_module_name, test/distributed/elastic/utils/logging_test.py::LoggingTest::test_logger_name 2025-12-04T12:52:04.8054749Z 2025-12-04T12:52:04.8055212Z Finished distributed/elastic/utils/logging_test 1/1 ... [2025-12-04 12:52:04.804598][14350.906723272], took 0.06min 2025-12-04T12:52:04.8758987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.utils.logging_test/distributed.elastic.utils.logging_test-ad00506eaa0f6b8e.xml 2025-12-04T12:52:04.9458092Z Running distributed/elastic/utils/util_test 1/1 ... [2025-12-04 12:52:04.945589][14351.047719471] 2025-12-04T12:52:04.9458766Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:52:04.9461295Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/utils/util_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:52:04.945926] 2025-12-04T12:52:08.9202195Z 2025-12-04T12:52:08.9203431Z distributed/elastic/utils/util_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.utils.util_test_1.1_c7771d0b50d6c4b7_.log 2025-12-04T12:52:08.9210311Z Running 12 items in this shard: test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier_hash_store, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier_timeout_operations, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier_timeout_rank_tracing, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_get_all_rank_0, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_get_all_rank_n, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_synchronize, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_synchronize_hash_store, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger_custom_name, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger_different, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger_none 2025-12-04T12:52:08.9215760Z 2025-12-04T12:52:08.9216204Z Finished distributed/elastic/utils/util_test 1/1 ... [2025-12-04 12:52:08.919771][14355.021896682], took 0.07min 2025-12-04T12:52:08.9917835Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.utils.util_test/distributed.elastic.utils.util_test-06e2f9e323fc3569.xml 2025-12-04T12:52:13.2045533Z Running test batch 'tests to run' cost 13539.85 seconds 2025-12-04T12:52:13.2049352Z Emitting td_test_failure_stats_v2 2025-12-04T12:52:13.2052864Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0df0c996d11011f0919c0242ac110002 2025-12-04T12:52:13.3136202Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0df0c996d11011f0919c0242ac110002 2025-12-04T12:52:13.3146092Z Emitting td_test_failure_stats_v2 2025-12-04T12:52:13.3147158Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e01712ed11011f0919c0242ac110002 2025-12-04T12:52:13.3525258Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e01712ed11011f0919c0242ac110002 2025-12-04T12:52:13.3528102Z Emitting td_test_failure_stats_v2 2025-12-04T12:52:13.3529384Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e075030d11011f0919c0242ac110002 2025-12-04T12:52:13.3870502Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e075030d11011f0919c0242ac110002 2025-12-04T12:52:13.3876299Z Emitting td_test_failure_stats_v2 2025-12-04T12:52:13.3877103Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e0c959ad11011f0919c0242ac110002 2025-12-04T12:52:13.4221343Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e0c959ad11011f0919c0242ac110002 2025-12-04T12:52:13.4228132Z Emitting td_test_failure_stats_v2 2025-12-04T12:52:13.4228947Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e11f3c8d11011f0919c0242ac110002 2025-12-04T12:52:13.4574534Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764852733_0e11f3c8d11011f0919c0242ac110002 2025-12-04T12:52:13.4575481Z distributed/fsdp/test_fsdp_apply 1/1 failed! 2025-12-04T12:52:13.4575980Z distributed/fsdp/test_fsdp_multiple_wrapping 1/1 failed! 2025-12-04T12:52:13.4576477Z distributed/fsdp/test_fsdp_fine_tune 1/1 failed! 2025-12-04T12:52:13.4576978Z distributed/fsdp/test_fsdp_dtensor_state_dict 1/1 failed! 2025-12-04T12:52:13.4577625Z distributed/fsdp/test_fsdp_core 1/2 failed! 2025-12-04T12:52:14.2638267Z 2025-12-04T12:52:14.2638687Z real 225m46.577s 2025-12-04T12:52:14.2639026Z user 508m57.004s 2025-12-04T12:52:14.2639296Z sys 252m19.493s 2025-12-04T12:52:14.2639536Z + sccache_epilogue 2025-12-04T12:52:14.2639856Z + echo '::group::Sccache Compilation Log' 2025-12-04T12:52:14.2640540Z ##[group]Sccache Compilation Log 2025-12-04T12:52:14.2640946Z + echo '=================== sccache compilation log ===================' 2025-12-04T12:52:14.2641417Z =================== sccache compilation log =================== 2025-12-04T12:52:14.2642121Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T12:52:14.2776426Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T12:52:14.2777252Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T12:52:14.2777822Z + sccache --show-stats 2025-12-04T12:52:14.2802703Z Compile requests 393 2025-12-04T12:52:14.2803213Z Compile requests executed 0 2025-12-04T12:52:14.2803627Z Cache hits 0 2025-12-04T12:52:14.2803972Z Cache misses 0 2025-12-04T12:52:14.2804354Z Cache hits rate - 2025-12-04T12:52:14.2804736Z Cache timeouts 0 2025-12-04T12:52:14.2805086Z Cache read errors 0 2025-12-04T12:52:14.2805481Z Forced recaches 0 2025-12-04T12:52:14.2805875Z Cache write errors 0 2025-12-04T12:52:14.2806262Z Cache errors 0 2025-12-04T12:52:14.2806610Z Compilations 0 2025-12-04T12:52:14.2807021Z Compilation failures 0 2025-12-04T12:52:14.2807417Z Non-cacheable compilations 0 2025-12-04T12:52:14.2807805Z Non-cacheable calls 7 2025-12-04T12:52:14.2808196Z Non-compilation calls 386 2025-12-04T12:52:14.2808594Z Unsupported compiler calls 0 2025-12-04T12:52:14.2809149Z Average cache write 0.000 s 2025-12-04T12:52:14.2809532Z Average compiler 0.000 s 2025-12-04T12:52:14.2809969Z Average cache read hit 0.000 s 2025-12-04T12:52:14.2810330Z Failed distributed compilations 0 2025-12-04T12:52:14.2810656Z 2025-12-04T12:52:14.2810765Z Non-cacheable reasons: 2025-12-04T12:52:14.2811058Z -E 7 2025-12-04T12:52:14.2811348Z 2025-12-04T12:52:14.2811706Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T12:52:14.2812276Z Version (client) 0.10.0 2025-12-04T12:52:14.2812619Z + sccache --stop-server 2025-12-04T12:52:14.2823621Z Stopping sccache server... 2025-12-04T12:52:14.2825878Z Compile requests 393 2025-12-04T12:52:14.2826243Z Compile requests executed 0 2025-12-04T12:52:14.2826669Z Cache hits 0 2025-12-04T12:52:14.2827005Z Cache misses 0 2025-12-04T12:52:14.2827400Z Cache hits rate - 2025-12-04T12:52:14.2827756Z Cache timeouts 0 2025-12-04T12:52:14.2828163Z Cache read errors 0 2025-12-04T12:52:14.2828493Z Forced recaches 0 2025-12-04T12:52:14.2828888Z Cache write errors 0 2025-12-04T12:52:14.2829254Z Cache errors 0 2025-12-04T12:52:14.2829676Z Compilations 0 2025-12-04T12:52:14.2830041Z Compilation failures 0 2025-12-04T12:52:14.2830400Z Non-cacheable compilations 0 2025-12-04T12:52:14.2830758Z Non-cacheable calls 7 2025-12-04T12:52:14.2831104Z Non-compilation calls 386 2025-12-04T12:52:14.2831464Z Unsupported compiler calls 0 2025-12-04T12:52:14.2831929Z Average cache write 0.000 s 2025-12-04T12:52:14.2832282Z Average compiler 0.000 s 2025-12-04T12:52:14.2832647Z Average cache read hit 0.000 s 2025-12-04T12:52:14.2833022Z Failed distributed compilations 0 2025-12-04T12:52:14.2833264Z 2025-12-04T12:52:14.2833371Z Non-cacheable reasons: 2025-12-04T12:52:14.2833656Z -E 7 2025-12-04T12:52:14.2833890Z 2025-12-04T12:52:14.2834146Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T12:52:14.2834653Z Version (client) 0.10.0 2025-12-04T12:52:14.2834994Z + echo ::endgroup:: 2025-12-04T12:52:14.2835484Z ##[endgroup] 2025-12-04T12:52:14.2835732Z + cleanup_workspace 2025-12-04T12:52:14.2836288Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.' 2025-12-04T12:52:14.2837210Z sudo may print the following warning message that can be ignored. The chown command will still run. 2025-12-04T12:52:14.2837927Z + echo ' sudo: setrlimit(RLIMIT_STACK): Operation not permitted' 2025-12-04T12:52:14.2838451Z sudo: setrlimit(RLIMIT_STACK): Operation not permitted 2025-12-04T12:52:14.2839227Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42' 2025-12-04T12:52:14.2840082Z For more details refer to https://github.com/sudo-project/sudo/issues/42 2025-12-04T12:52:14.2840629Z + sudo chown -R 1000 /var/lib/jenkins/workspace 2025-12-04T12:52:14.9513306Z ##[error]Process completed with exit code 1. 2025-12-04T12:52:14.9587875Z Prepare all required actions 2025-12-04T12:52:14.9588330Z Getting action download info 2025-12-04T12:52:15.1335340Z ##[group]Run ./.github/actions/pytest-cache-upload 2025-12-04T12:52:15.1335736Z with: 2025-12-04T12:52:15.1335992Z cache_dir: .pytest_cache 2025-12-04T12:52:15.1336290Z shard: 1 2025-12-04T12:52:15.1336569Z sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T12:52:15.1336964Z test_config: distributed 2025-12-04T12:52:15.1337362Z job_identifier: trunk_linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T12:52:15.1337777Z env: 2025-12-04T12:52:15.1338002Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:15.1338304Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:15.1338665Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:15.1339294Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:15.1339869Z ##[endgroup] 2025-12-04T12:52:15.1374553Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T12:52:15.1374956Z with: 2025-12-04T12:52:15.1375174Z shell: bash 2025-12-04T12:52:15.1375425Z timeout_minutes: 5 2025-12-04T12:52:15.1375810Z max_attempts: 5 2025-12-04T12:52:15.1376069Z retry_wait_seconds: 30 2025-12-04T12:52:15.1376455Z command: set -eu python3 -m pip install boto3==1.35.42 2025-12-04T12:52:15.1376900Z polling_interval_seconds: 1 2025-12-04T12:52:15.1377225Z warning_on_retry: true 2025-12-04T12:52:15.1377511Z continue_on_error: false 2025-12-04T12:52:15.1377800Z env: 2025-12-04T12:52:15.1378040Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:15.1378322Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:15.1378960Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:15.1379614Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:15.1380174Z ##[endgroup] 2025-12-04T12:52:15.4925453Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T12:52:16.7121068Z Collecting boto3==1.35.42 2025-12-04T12:52:16.7294478Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-12-04T12:52:16.7453582Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0) 2025-12-04T12:52:16.8128483Z Collecting s3transfer<0.11.0,>=0.10.0 2025-12-04T12:52:16.8164461Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-12-04T12:52:18.0860922Z Collecting botocore<1.36.0,>=1.35.42 2025-12-04T12:52:18.0899858Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-12-04T12:52:18.2481424Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1) 2025-12-04T12:52:18.2489653Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10) 2025-12-04T12:52:18.4230064Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0) 2025-12-04T12:52:18.5194192Z Installing collected packages: botocore, s3transfer, boto3 2025-12-04T12:52:19.1042063Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4 2025-12-04T12:52:19.2215392Z Command completed after 1 attempt(s). 2025-12-04T12:52:19.2286036Z ##[group]Run python3 .github/scripts/pytest_cache.py \ 2025-12-04T12:52:19.2286521Z python3 .github/scripts/pytest_cache.py \ 2025-12-04T12:52:19.2286918Z  --upload \ 2025-12-04T12:52:19.2287260Z  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \ 2025-12-04T12:52:19.2287794Z  --pr_identifier "$GITHUB_REF" \ 2025-12-04T12:52:19.2288164Z  --job_identifier "$JOB_IDENTIFIER" \ 2025-12-04T12:52:19.2288502Z  --sha "$SHA" \ 2025-12-04T12:52:19.2288800Z  --test_config "$TEST_CONFIG" \ 2025-12-04T12:52:19.2289129Z  --shard "$SHARD" \ 2025-12-04T12:52:19.2289410Z  --repo "$REPO" \ 2025-12-04T12:52:19.2289906Z  --temp_dir "$RUNNER_TEMP" \ 2025-12-04T12:52:19.2300803Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:19.2301190Z env: 2025-12-04T12:52:19.2301422Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:19.2301699Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:19.2302029Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:19.2302598Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:19.2303147Z CACHE_DIR: .pytest_cache 2025-12-04T12:52:19.2303498Z JOB_IDENTIFIER: trunk_linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T12:52:19.2303915Z SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T12:52:19.2304271Z TEST_CONFIG: distributed 2025-12-04T12:52:19.2304533Z SHARD: 1 2025-12-04T12:52:19.2304745Z REPO: pytorch/pytorch 2025-12-04T12:52:19.2305005Z ##[endgroup] 2025-12-04T12:52:19.6082729Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013` 2025-12-04T12:52:19.6085219Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='trunk_linux-jammy-cuda12.8-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='distributed', shard='1', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None) 2025-12-04T12:52:19.6087907Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache 2025-12-04T12:52:19.6089348Z to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/1 2025-12-04T12:52:19.6091379Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/1.zip 2025-12-04T12:52:19.6093401Z to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/1.zip 2025-12-04T12:52:19.6614442Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T12:52:19.6614916Z cat test/**/*_toprint.log || true 2025-12-04T12:52:19.6621829Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:19.6622334Z env: 2025-12-04T12:52:19.6622567Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:19.6622845Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:19.6623157Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:19.6623737Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:19.6624253Z ##[endgroup] 2025-12-04T12:52:19.6721369Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T12:52:19.6750011Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-12-04T12:52:19.6750408Z kill "$MONITOR_SCRIPT_PID" 2025-12-04T12:52:19.6756191Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:19.6756572Z env: 2025-12-04T12:52:19.6756794Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:19.6757068Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:19.6757381Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:19.6757967Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:19.6758510Z MONITOR_SCRIPT_PID: 62930 2025-12-04T12:52:19.6758782Z ##[endgroup] 2025-12-04T12:52:19.6786741Z /home/ec2-user/actions-runner/_work/_temp/bc2451ec-0482-42a2-b076-fb929539af18.sh: line 1: kill: (62930) - No such process 2025-12-04T12:52:19.6788783Z ##[error]Process completed with exit code 1. 2025-12-04T12:52:19.6920444Z Prepare all required actions 2025-12-04T12:52:19.6920885Z Getting action download info 2025-12-04T12:52:19.9101400Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T12:52:20.1180827Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T12:52:20.5135651Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T12:52:20.5136058Z with: 2025-12-04T12:52:20.5136514Z file-suffix: test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T12:52:20.5137104Z s3-bucket: gha-artifacts 2025-12-04T12:52:20.5137410Z env: 2025-12-04T12:52:20.5137640Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:20.5137941Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:20.5138307Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:20.5138964Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:20.5139575Z ##[endgroup] 2025-12-04T12:52:20.5181483Z ##[group]Run # Remove any previous test jsons if they exist 2025-12-04T12:52:20.5182005Z # Remove any previous test jsons if they exist 2025-12-04T12:52:20.5182428Z rm -f test-jsons-*.zip 2025-12-04T12:52:20.5182904Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-12-04T12:52:20.5189202Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:20.5189624Z env: 2025-12-04T12:52:20.5189848Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:20.5190137Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:20.5190481Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:20.5191177Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:20.5191868Z FILE_SUFFIX: test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T12:52:20.5192354Z ##[endgroup] 2025-12-04T12:52:20.5483509Z adding: test/test-reports/td_exclusions-9df322eab3f9eaed20c1.json (deflated 86%) 2025-12-04T12:52:20.5487952Z adding: test/test-reports/python-pytest/distributed.test_dynamo_distributed/distributed.test_dynamo_distributed-7d68e185dc40b8e4.json (deflated 91%) 2025-12-04T12:52:20.5489423Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-550d70029afd2dcd.json (deflated 79%) 2025-12-04T12:52:20.5490851Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-20b4ace7c31b01bc.json (deflated 79%) 2025-12-04T12:52:20.5492348Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-fca413b3f7307fd5.json (deflated 79%) 2025-12-04T12:52:20.5494148Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-ee4bf9b90915483d.json (deflated 79%) 2025-12-04T12:52:20.5495565Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5878d33d525e22d1.json (deflated 79%) 2025-12-04T12:52:20.5496960Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-71ced988c25116a1.json (deflated 79%) 2025-12-04T12:52:20.5498371Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-00e807acf0912dba.json (deflated 79%) 2025-12-04T12:52:20.5499800Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-b604ebea332b8d41.json (deflated 79%) 2025-12-04T12:52:20.5501214Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-320024c0c6bb40b5.json (deflated 79%) 2025-12-04T12:52:20.5502617Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-f868d67f33e24985.json (stored 0%) 2025-12-04T12:52:20.5504287Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-7c59f12ab3dc26b8.json (deflated 80%) 2025-12-04T12:52:20.5506077Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-fd5feab7f24ea67e.json (deflated 80%) 2025-12-04T12:52:20.5507713Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-42ba77c7c8182ea3.json (deflated 80%) 2025-12-04T12:52:20.5509300Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-c4d1f6f933180ae5.json (stored 0%) 2025-12-04T12:52:20.5510794Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e7575131d09c7d5b.json (deflated 79%) 2025-12-04T12:52:20.5512228Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fea5835408d37079.json (deflated 79%) 2025-12-04T12:52:20.5513650Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-3b87fcc1c5f1359f.json (deflated 79%) 2025-12-04T12:52:20.5515074Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2c3852776dc4d6af.json (deflated 79%) 2025-12-04T12:52:20.5516547Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9e3a89401a26a2c7.json (deflated 79%) 2025-12-04T12:52:20.5517964Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-dadce7936b268df6.json (deflated 79%) 2025-12-04T12:52:20.5519401Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fc03e360104e794a.json (deflated 79%) 2025-12-04T12:52:20.5520832Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2239b5ce820a8e80.json (deflated 79%) 2025-12-04T12:52:20.5522252Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-750660f185e24025.json (deflated 79%) 2025-12-04T12:52:20.5523665Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-8a47980366c1ac84.json (deflated 79%) 2025-12-04T12:52:20.5525095Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-ca9abf7ee24c038e.json (deflated 79%) 2025-12-04T12:52:20.5526558Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-7eb1dc39773c41c4.json (deflated 79%) 2025-12-04T12:52:20.5527974Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-b3cb6eb5be1f3e0c.json (stored 0%) 2025-12-04T12:52:20.5529478Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-dcb6c7b6743de89e.json (deflated 80%) 2025-12-04T12:52:20.5531085Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c0871d667bd4df8d.json (deflated 80%) 2025-12-04T12:52:20.5532682Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b1a99f4c33297699.json (deflated 88%) 2025-12-04T12:52:20.5534540Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7332aead750b9bce.json (deflated 80%) 2025-12-04T12:52:20.5536204Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c7d658062419b597.json (deflated 80%) 2025-12-04T12:52:20.5537932Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-348cc3a828a50222.json (deflated 80%) 2025-12-04T12:52:20.5539568Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bb573131fa19ab29.json (deflated 80%) 2025-12-04T12:52:20.5541223Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-16546bb6943a3c11.json (deflated 88%) 2025-12-04T12:52:20.5542873Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-cdd2e74ccc0956b9.json (deflated 80%) 2025-12-04T12:52:20.5544536Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9650dbe5a6e76fd8.json (deflated 88%) 2025-12-04T12:52:20.5546277Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-53bea78db525054e.json (deflated 80%) 2025-12-04T12:52:20.5547861Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-470dd7f8801a129e.json (deflated 80%) 2025-12-04T12:52:20.5549486Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-864712a0594b6ca2.json (deflated 80%) 2025-12-04T12:52:20.5551086Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c88c69879eff0a17.json (deflated 80%) 2025-12-04T12:52:20.5552685Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ecd21e7500304b9f.json (deflated 80%) 2025-12-04T12:52:20.5554282Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-e7cfa143d1c9be09.json (deflated 80%) 2025-12-04T12:52:20.5555882Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-976f30802ad214bb.json (deflated 80%) 2025-12-04T12:52:20.5557482Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d8b05be053af669.json (deflated 88%) 2025-12-04T12:52:20.5559075Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ceb5badd22358e55.json (deflated 91%) 2025-12-04T12:52:20.5560711Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-f5dcf7c66579f3c2.json (deflated 80%) 2025-12-04T12:52:20.5562301Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-21e2c8920cf3865d.json (deflated 80%) 2025-12-04T12:52:20.5563895Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-3dd1dab0649736e8.json (deflated 80%) 2025-12-04T12:52:20.5565494Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-255818cdbe5fbd05.json (deflated 80%) 2025-12-04T12:52:20.5567085Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2340b7a625d10704.json (deflated 88%) 2025-12-04T12:52:20.5568679Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-6880e02fcbe22f17.json (deflated 88%) 2025-12-04T12:52:20.5570323Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a497c1942163e16f.json (deflated 88%) 2025-12-04T12:52:20.5571924Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a4ee3bf5f7a9a01f.json (deflated 80%) 2025-12-04T12:52:20.5573591Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d9ebf91db9daa02.json (deflated 80%) 2025-12-04T12:52:20.5575442Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-04afe2c287023adc.json (deflated 80%) 2025-12-04T12:52:20.5577096Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-80cc64b9f2eb85b8.json (deflated 80%) 2025-12-04T12:52:20.5578907Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c38510c20e07f456.json (deflated 88%) 2025-12-04T12:52:20.5580551Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bec03360e514672a.json (deflated 88%) 2025-12-04T12:52:20.5582194Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7929a6c5753a5bf7.json (deflated 80%) 2025-12-04T12:52:20.5583900Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b0deb68b75574955.json (deflated 80%) 2025-12-04T12:52:20.5585542Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-eda6c23a06d3c574.json (deflated 80%) 2025-12-04T12:52:20.5587185Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2e17c39fb483ae46.json (deflated 80%) 2025-12-04T12:52:20.5588826Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9ad7d4c20da7406b.json (deflated 88%) 2025-12-04T12:52:20.5590582Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-5b391720a035fce0.json (deflated 80%) 2025-12-04T12:52:20.5592194Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7e0ad2dc0411fa40.json (deflated 80%) 2025-12-04T12:52:20.5593792Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ab2eda46e6c1c6d0.json (deflated 80%) 2025-12-04T12:52:20.5595419Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7d4e43b394d06af0.json (deflated 80%) 2025-12-04T12:52:20.5597013Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-d04e17113c0af8ba.json (deflated 80%) 2025-12-04T12:52:20.5598607Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a3b3387bd6019536.json (deflated 79%) 2025-12-04T12:52:20.5600199Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-06677f872919b29b.json (deflated 79%) 2025-12-04T12:52:20.5601790Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-da2c1a3b7d1cdaf6.json (deflated 79%) 2025-12-04T12:52:20.5603372Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-467d89e082f97fc4.json (stored 0%) 2025-12-04T12:52:20.5604888Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4e48aa8d10589348.json (deflated 79%) 2025-12-04T12:52:20.5606230Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3193e57821c2ebca.json (deflated 79%) 2025-12-04T12:52:20.5607566Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9a7469c5b46925c2.json (deflated 87%) 2025-12-04T12:52:20.5608888Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-227ae9e59104394c.json (deflated 79%) 2025-12-04T12:52:20.5610224Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-38750597d70d3b79.json (deflated 79%) 2025-12-04T12:52:20.5611558Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8d9e40030a96f20.json (deflated 88%) 2025-12-04T12:52:20.5612896Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-257466dde9fb107b.json (deflated 91%) 2025-12-04T12:52:20.5614480Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-94f5dd2e01869af2.json (deflated 79%) 2025-12-04T12:52:20.5615904Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-42c522b0340c97ac.json (deflated 79%) 2025-12-04T12:52:20.5617282Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-32205b0cc860e51d.json (deflated 79%) 2025-12-04T12:52:20.5618669Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e08ad6962badbec0.json (deflated 79%) 2025-12-04T12:52:20.5620069Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2ea67fcde569130f.json (deflated 79%) 2025-12-04T12:52:20.5621437Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2525c9886ebe84d6.json (deflated 79%) 2025-12-04T12:52:20.5622822Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-265cb7987b98bd4a.json (deflated 79%) 2025-12-04T12:52:20.5624205Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0dad56460685f27c.json (deflated 79%) 2025-12-04T12:52:20.5625693Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0a57eff14e5fabd3.json (deflated 87%) 2025-12-04T12:52:20.5627061Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f9981ec75d7ffd49.json (deflated 79%) 2025-12-04T12:52:20.5628403Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9f763e8043031072.json (deflated 87%) 2025-12-04T12:52:20.5629746Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-02dc996acd3ff226.json (deflated 79%) 2025-12-04T12:52:20.5631094Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a9525f3a3720890d.json (deflated 79%) 2025-12-04T12:52:20.5632426Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0add77cc4faf0004.json (deflated 79%) 2025-12-04T12:52:20.5633774Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb044786f28290de.json (deflated 79%) 2025-12-04T12:52:20.5635115Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3368174cbc9b5a.json (deflated 79%) 2025-12-04T12:52:20.5636690Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48241fc1d70a928.json (deflated 87%) 2025-12-04T12:52:20.5638039Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4808bec29186d3a1.json (deflated 79%) 2025-12-04T12:52:20.5639377Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0fe450ea21eea83e.json (deflated 79%) 2025-12-04T12:52:20.5640729Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4119ccbda03fb8bd.json (deflated 79%) 2025-12-04T12:52:20.5642080Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77506912df9607dd.json (deflated 79%) 2025-12-04T12:52:20.5643420Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-71967038f6397bcb.json (deflated 87%) 2025-12-04T12:52:20.5644798Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7aea602ded691711.json (deflated 87%) 2025-12-04T12:52:20.5646142Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-588db22c786ffc0c.json (deflated 87%) 2025-12-04T12:52:20.5647486Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d2c93dc13a89050c.json (deflated 87%) 2025-12-04T12:52:20.5648845Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e269a47641789945.json (deflated 79%) 2025-12-04T12:52:20.5650181Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d0e2108c889b6f40.json (deflated 79%) 2025-12-04T12:52:20.5651507Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d7cc16231ece4156.json (deflated 87%) 2025-12-04T12:52:20.5652858Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-01da52837cd28026.json (deflated 79%) 2025-12-04T12:52:20.5654481Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa4201c32172891c.json (deflated 79%) 2025-12-04T12:52:20.5655869Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4da7b120579aed6b.json (deflated 91%) 2025-12-04T12:52:20.5657268Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8c1fa5c204db7919.json (deflated 79%) 2025-12-04T12:52:20.5658672Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-79d8c8140e8d4a45.json (deflated 79%) 2025-12-04T12:52:20.5660095Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8cddc87d4e2da4.json (deflated 79%) 2025-12-04T12:52:20.5661503Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-312ddbdab57572f7.json (deflated 79%) 2025-12-04T12:52:20.5662905Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb1563559edf316c.json (deflated 87%) 2025-12-04T12:52:20.5664294Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-63a7a52cd3aa8936.json (deflated 91%) 2025-12-04T12:52:20.5665779Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a6324f00d63e140d.json (deflated 79%) 2025-12-04T12:52:20.5667141Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cab4f0cffa47b1f.json (deflated 79%) 2025-12-04T12:52:20.5668501Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a997c4f2b1c679bc.json (deflated 79%) 2025-12-04T12:52:20.5669925Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb42278badc3bd05.json (deflated 87%) 2025-12-04T12:52:20.5671289Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e66930a4930311d.json (deflated 87%) 2025-12-04T12:52:20.5672633Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-caca850dfa53af0d.json (deflated 87%) 2025-12-04T12:52:20.5673994Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5ab6f8f72e0857a0.json (deflated 79%) 2025-12-04T12:52:20.5675349Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82a2bf200d1dcaa2.json (deflated 79%) 2025-12-04T12:52:20.5676716Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a014b9bd1b37d049.json (deflated 79%) 2025-12-04T12:52:20.5678058Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-95f17e90ca4b9755.json (deflated 79%) 2025-12-04T12:52:20.5679815Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7ff8c73ee302c339.json (deflated 79%) 2025-12-04T12:52:20.5681215Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cb6a3efe573e986.json (deflated 79%) 2025-12-04T12:52:20.5682680Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-44c122860a547cf4.json (deflated 91%) 2025-12-04T12:52:20.5684065Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-668210e8e09c8dd9.json (deflated 79%) 2025-12-04T12:52:20.5685458Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9448ec7a0a61b5a6.json (deflated 91%) 2025-12-04T12:52:20.5686856Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8d6fd75ad2c1f260.json (deflated 79%) 2025-12-04T12:52:20.5688253Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa2eb835ecdd4375.json (deflated 87%) 2025-12-04T12:52:20.5689658Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e3a192bec2a8308.json (deflated 79%) 2025-12-04T12:52:20.5691152Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c427df5212a82823.json (deflated 91%) 2025-12-04T12:52:20.5692512Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a459d6ece2e0d396.json (deflated 79%) 2025-12-04T12:52:20.5694146Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82bd08cbda0b3168.json (deflated 79%) 2025-12-04T12:52:20.5695555Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cb7d608d20fa1845.json (deflated 79%) 2025-12-04T12:52:20.5696938Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3bf4168a6952dca5.json (deflated 87%) 2025-12-04T12:52:20.5698324Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-69250cff44e166fa.json (deflated 79%) 2025-12-04T12:52:20.5699712Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-311a7d97c78eb59e.json (deflated 79%) 2025-12-04T12:52:20.5701091Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5f502e67619c39f3.json (deflated 87%) 2025-12-04T12:52:20.5702475Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8eee5abee9febb4.json (deflated 79%) 2025-12-04T12:52:20.5703980Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-acff5684d72dd2d3.json (deflated 79%) 2025-12-04T12:52:20.5705372Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-72a03384c8cb338e.json (deflated 79%) 2025-12-04T12:52:20.5706840Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-efc6da476f35386f.json (deflated 79%) 2025-12-04T12:52:20.5708190Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ecdb0e90ac1c2bc1.json (deflated 79%) 2025-12-04T12:52:20.5709533Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9524e9df873b8be0.json (deflated 87%) 2025-12-04T12:52:20.5710885Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ad9e0258dc223929.json (deflated 87%) 2025-12-04T12:52:20.5712239Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-17a67113f8be5d53.json (deflated 87%) 2025-12-04T12:52:20.5713593Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c0681c30ea8c1a74.json (deflated 79%) 2025-12-04T12:52:20.5714936Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-27fc2bea2cad5f2f.json (deflated 79%) 2025-12-04T12:52:20.5716322Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-834db467a2bb808c.json (deflated 79%) 2025-12-04T12:52:20.5717674Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48f6c290350ef90.json (deflated 80%) 2025-12-04T12:52:20.5719036Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1ee2f71fb8de6413.json (deflated 88%) 2025-12-04T12:52:20.5720381Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7f8857b703d650c5.json (deflated 80%) 2025-12-04T12:52:20.5721730Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19e11b35947b0a14.json (deflated 87%) 2025-12-04T12:52:20.5723079Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b5d6f54eb5c8ad3.json (deflated 80%) 2025-12-04T12:52:20.5724429Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b2eb1b61ddd90ac8.json (deflated 88%) 2025-12-04T12:52:20.5725791Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e6810d3a1c38013d.json (deflated 80%) 2025-12-04T12:52:20.5727158Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-53dff883b0afb17e.json (deflated 80%) 2025-12-04T12:52:20.5728514Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c92b48ae22d6a39.json (deflated 80%) 2025-12-04T12:52:20.5729863Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb06b260fb006313.json (deflated 79%) 2025-12-04T12:52:20.5731203Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29ffb7b96244526a.json (deflated 79%) 2025-12-04T12:52:20.5732545Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-64a83fa5a2cd03db.json (deflated 87%) 2025-12-04T12:52:20.5734119Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1d1edf2996f09e22.json (deflated 87%) 2025-12-04T12:52:20.5735509Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d210f9114da1dd7.json (deflated 87%) 2025-12-04T12:52:20.5736975Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ffec29f281535337.json (deflated 79%) 2025-12-04T12:52:20.5738358Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-878c5ded0afd20c5.json (deflated 79%) 2025-12-04T12:52:20.5739752Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c1c7d4a809089a1.json (deflated 79%) 2025-12-04T12:52:20.5741142Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-78a9e7f81c58cb5d.json (deflated 79%) 2025-12-04T12:52:20.5742511Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6f57499824dd1125.json (stored 0%) 2025-12-04T12:52:20.5744094Z adding: test/test-reports/python-pytest/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks-671b5e8f8d643201.json (deflated 86%) 2025-12-04T12:52:20.5745819Z adding: test/test-reports/python-pytest/distributed.tensor.test_op_schema/distributed.tensor.test_op_schema-bb5a16ac0960925a.json (deflated 62%) 2025-12-04T12:52:20.5747280Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_nested_dict/distributed.checkpoint.test_nested_dict-81f92522f1154383.json (deflated 64%) 2025-12-04T12:52:20.5749005Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_consolidate_hf_safetensors/distributed.checkpoint.test_consolidate_hf_safetensors-d914312b5a4148e2.json (deflated 87%) 2025-12-04T12:52:20.5750820Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_barriers/distributed.checkpoint._experimental.test_barriers-d8f5a49da0f436d9.json (deflated 65%) 2025-12-04T12:52:20.5752493Z adding: test/test-reports/python-pytest/distributed.pipelining.test_transformer/distributed.pipelining.test_transformer-e70c997724b03d0e.json (deflated 40%) 2025-12-04T12:52:20.5754089Z adding: test/test-reports/python-pytest/distributed.flight_recorder.test_fr_analysis/distributed.flight_recorder.test_fr_analysis-aba4e9f61260e449.json (deflated 78%) 2025-12-04T12:52:20.5755649Z adding: test/test-reports/python-pytest/distributed._composable.test_contract/distributed._composable.test_contract-43d2ccf9f44c35a5.json (deflated 74%) 2025-12-04T12:52:20.5757187Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_dedup_tensors/distributed.checkpoint.test_dedup_tensors-98db0ae6ec3ef072.json (deflated 39%) 2025-12-04T12:52:20.5758693Z adding: test/test-reports/python-pytest/distributed.pipelining.test_pipe/distributed.pipelining.test_pipe-b65ad592f97073ad.json (deflated 72%) 2025-12-04T12:52:20.5760171Z adding: test/test-reports/python-pytest/distributed.pipelining.test_backward/distributed.pipelining.test_backward-4f9205b7617a9aaf.json (deflated 83%) 2025-12-04T12:52:20.5761682Z adding: test/test-reports/python-pytest/distributed.test_nvshmem_triton/distributed.test_nvshmem_triton-2d1da825c1a177a7.json (deflated 97%) 2025-12-04T12:52:20.5763057Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor/distributed.tensor.test_dtensor-780171e06b9d081c.json (deflated 93%) 2025-12-04T12:52:20.5764344Z adding: test/test-reports/python-pytest/distributed.test_p2p_ipc/distributed.test_p2p_ipc-22d7fd7242fa3e1d.json (deflated 44%) 2025-12-04T12:52:20.5765676Z adding: test/test-reports/python-pytest/distributed.tensor.test_common_rules/distributed.tensor.test_common_rules-f2e475ef5a58885a.json (deflated 89%) 2025-12-04T12:52:20.5767219Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_hf_safetensor_e2e/distributed.checkpoint.test_hf_safetensor_e2e-49bc702c32e1be14.json (deflated 87%) 2025-12-04T12:52:20.5768717Z adding: test/test-reports/python-pytest/distributed.tensor.test_dynamic/distributed.tensor.test_dynamic-58a11920d980fced.json (deflated 80%) 2025-12-04T12:52:20.5770227Z adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_fsdp_ep/distributed.checkpoint.e2e.test_fsdp_ep-90e84c8c71d0d519.json (deflated 43%) 2025-12-04T12:52:20.5771759Z adding: test/test-reports/python-pytest/distributed.pipelining.test_unflatten/distributed.pipelining.test_unflatten-9c61fbce9d8da54e.json (deflated 39%) 2025-12-04T12:52:20.5773351Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_testbase/distributed.tensor.test_dtensor_testbase-90ed30d8fe3a2fcc.json (deflated 41%) 2025-12-04T12:52:20.5775176Z adding: test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-1fba6503450910ca.json (deflated 90%) 2025-12-04T12:52:20.5776566Z adding: test/test-reports/python-pytest/distributed.test_nvshmem/distributed.test_nvshmem-c601d1a92c913214.json (deflated 97%) 2025-12-04T12:52:20.5777922Z adding: test/test-reports/python-pytest/distributed.tensor.test_attention/distributed.tensor.test_attention-f7e42a024369f922.json (deflated 89%) 2025-12-04T12:52:20.5779609Z adding: test/test-reports/python-pytest/distributed.tensor.test_convolution_ops/distributed.tensor.test_convolution_ops-37d4b4387fe7c9dd.json (deflated 92%) 2025-12-04T12:52:20.5781213Z adding: test/test-reports/python-pytest/distributed.checkpoint.fsdp.test_fsdp_dsd/distributed.checkpoint.fsdp.test_fsdp_dsd-1bb4a1e7d3cbef72.json (deflated 85%) 2025-12-04T12:52:20.5782880Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_save_load_api/distributed.checkpoint.test_save_load_api-607495a1278fa4ba.json (deflated 63%) 2025-12-04T12:52:20.5784555Z adding: test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode_features/distributed.tensor.debug.test_comm_mode_features-f7a4a3df89327d4b.json (deflated 80%) 2025-12-04T12:52:20.5786167Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_ops/distributed.tensor.test_dtensor_ops-ea81859469c32dce.json (stored 0%) 2025-12-04T12:52:20.5787506Z adding: test/test-reports/python-pytest/distributed.test_debug/distributed.test_debug-be889cccd8acb9a9.json (deflated 37%) 2025-12-04T12:52:20.5788907Z adding: test/test-reports/python-pytest/distributed.test_overlap_bucketing_unit/distributed.test_overlap_bucketing_unit-ca2c159a43fd5a2e.json (deflated 86%) 2025-12-04T12:52:20.5790803Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_writer/distributed.checkpoint._experimental.test_checkpoint_writer-b51ac79d06c0ddb7.json (deflated 89%) 2025-12-04T12:52:20.5792743Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpointer/distributed.checkpoint._experimental.test_checkpointer-181aea9ab4e75ef7.json (deflated 82%) 2025-12-04T12:52:20.5794346Z adding: test/test-reports/python-pytest/distributed.tensor.test_init/distributed.tensor.test_init-b970b50400f392fc.json (deflated 89%) 2025-12-04T12:52:20.5795809Z adding: test/test-reports/python-pytest/distributed._composable.test_checkpoint/distributed._composable.test_checkpoint-a1aa396939174424.json (deflated 85%) 2025-12-04T12:52:20.5797346Z adding: test/test-reports/python-pytest/distributed._tools.test_fsdp2_mem_tracker/distributed._tools.test_fsdp2_mem_tracker-3cf763bb11a5de99.json (deflated 72%) 2025-12-04T12:52:20.5799022Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate_mixed_precision/distributed._composable.test_replicate_mixed_precision-36b9cc9e417e77fd.json (deflated 88%) 2025-12-04T12:52:20.5800753Z adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_fine_tuning/distributed.checkpoint.e2e.test_fine_tuning-c68ce24632e972fe.json (deflated 40%) 2025-12-04T12:52:20.5802268Z adding: test/test-reports/python-pytest/distributed.tensor.test_matrix_ops/distributed.tensor.test_matrix_ops-225fb5a0fab4f212.json (deflated 94%) 2025-12-04T12:52:20.5803709Z adding: test/test-reports/python-pytest/distributed.tensor.test_optimizers/distributed.tensor.test_optimizers-208a364b29da2421.json (deflated 90%) 2025-12-04T12:52:20.5805206Z adding: test/test-reports/python-pytest/distributed.test_symmetric_memory/distributed.test_symmetric_memory-e6666f579f07be4f.json (deflated 97%) 2025-12-04T12:52:20.5806673Z adding: test/test-reports/python-pytest/distributed._tools.test_runtime_estimator/distributed._tools.test_runtime_estimator-9422bc676dd3a656.json (deflated 64%) 2025-12-04T12:52:20.5808334Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate_with_compiler/distributed._composable.test_replicate_with_compiler-845dc753fbff3b86.json (deflated 85%) 2025-12-04T12:52:20.5810119Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_autograd/distributed._composable.fsdp.test_fully_shard_autograd-6a8bc02b72927b79.json (deflated 81%) 2025-12-04T12:52:20.5812035Z adding: test/test-reports/python-pytest/distributed._composable.test_composability.test_2d_composability/distributed._composable.test_composability.test_2d_composability-218cfa8a31a3ba84.json (deflated 89%) 2025-12-04T12:52:20.5814028Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_optim_state/distributed.fsdp.test_fsdp_optim_state-a3d7bfb88e0bb04b.json (deflated 96%) 2025-12-04T12:52:20.5815453Z adding: test/test-reports/python-pytest/distributed.test_c10d_logger/distributed.test_c10d_logger-087942ef032695a4.json (deflated 63%) 2025-12-04T12:52:20.5817011Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate_training/distributed._composable.test_replicate_training-2cbeb0e1e9d2c847.json (deflated 88%) 2025-12-04T12:52:20.5818601Z adding: test/test-reports/python-pytest/distributed.rpc.test_share_memory/distributed.rpc.test_share_memory-7afea101a44bab53.json (deflated 48%) 2025-12-04T12:52:20.5820059Z adding: test/test-reports/python-pytest/distributed.tensor.test_op_strategy/distributed.tensor.test_op_strategy-6fbbc916638ee901.json (deflated 92%) 2025-12-04T12:52:20.5821537Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_grad_acc/distributed.fsdp.test_fsdp_grad_acc-a75842029d7b9dcc.json (deflated 86%) 2025-12-04T12:52:20.5823117Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_stager/distributed.checkpoint.test_state_dict_stager-c8decf93ed909c05.json (deflated 87%) 2025-12-04T12:52:20.5824782Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_freezing_weights/distributed.fsdp.test_fsdp_freezing_weights-c610b4e9e056a60a.json (deflated 96%) 2025-12-04T12:52:20.5826565Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_init/distributed._composable.fsdp.test_fully_shard_init-94adf46d5612666a.json (deflated 93%) 2025-12-04T12:52:20.5828179Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_flatten_params/distributed.fsdp.test_fsdp_flatten_params-1722984b0a3e650a.json (deflated 91%) 2025-12-04T12:52:20.5829665Z adding: test/test-reports/python-pytest/distributed.test_composability/distributed.test_composability-4dcd79eb001aa4cf.json (deflated 91%) 2025-12-04T12:52:20.5831042Z adding: test/test-reports/python-pytest/distributed.test_multi_threaded_pg/distributed.test_multi_threaded_pg-cb00591a34ee6ad2.json (deflated 90%) 2025-12-04T12:52:20.5832678Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_extensions/distributed._composable.fsdp.test_fully_shard_extensions-e4e54db12d00fc4b.json (deflated 83%) 2025-12-04T12:52:20.5834250Z adding: test/test-reports/python-pytest/distributed.fsdp.test_wrap/distributed.fsdp.test_wrap-8d38fac6f1a86713.json (deflated 89%) 2025-12-04T12:52:20.5835622Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_hybrid_shard/distributed.fsdp.test_fsdp_hybrid_shard-b37436896c0f0a07.json (deflated 83%) 2025-12-04T12:52:20.5837262Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_training/distributed._composable.fsdp.test_fully_shard_training-6c9b30f951a7219e.json (deflated 89%) 2025-12-04T12:52:20.5839005Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-d961ab4b1fb94450.json (deflated 56%) 2025-12-04T12:52:20.5840577Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2749604adbdd83d7.json (deflated 36%) 2025-12-04T12:52:20.5842152Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a08d4078ed5ab3a4.json (deflated 35%) 2025-12-04T12:52:20.5843728Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f043e483cea1f140.json (deflated 36%) 2025-12-04T12:52:20.5845308Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-562cdf6dc98614a4.json (deflated 36%) 2025-12-04T12:52:20.5846872Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7a355d7d848a3783.json (deflated 35%) 2025-12-04T12:52:20.5848437Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e938b1bd7ee21e63.json (deflated 35%) 2025-12-04T12:52:20.5850012Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7b620378c03b2b8c.json (deflated 37%) 2025-12-04T12:52:20.5851594Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-51e337155d132168.json (deflated 39%) 2025-12-04T12:52:20.5853152Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-858bb2cae53302d6.json (deflated 38%) 2025-12-04T12:52:20.5854989Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f578d6bffa26b363.json (deflated 38%) 2025-12-04T12:52:20.5856613Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-3d9a69729c5194dd.json (deflated 38%) 2025-12-04T12:52:20.5858232Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-907751cfd0f9a14d.json (deflated 38%) 2025-12-04T12:52:20.5859854Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-9a92a0723441ebbb.json (deflated 38%) 2025-12-04T12:52:20.5861463Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-000e8f0311241e72.json (deflated 37%) 2025-12-04T12:52:20.5863114Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-18ef89366410fd29.json (deflated 37%) 2025-12-04T12:52:20.5864725Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ae7171f8ebe30954.json (deflated 37%) 2025-12-04T12:52:20.5866417Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-03f5357c50df6990.json (deflated 37%) 2025-12-04T12:52:20.5867970Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-df0148e80116049a.json (deflated 36%) 2025-12-04T12:52:20.5869542Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f8722be29ef1f355.json (deflated 37%) 2025-12-04T12:52:20.5871109Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-5b2d107716225579.json (deflated 36%) 2025-12-04T12:52:20.5872730Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ad4182055d9f07f8.json (deflated 37%) 2025-12-04T12:52:20.5874306Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-b572842ca37510d6.json (deflated 36%) 2025-12-04T12:52:20.5875869Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-0b1a30e997ca6431.json (deflated 36%) 2025-12-04T12:52:20.5877434Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e562e66ae42d90cb.json (deflated 36%) 2025-12-04T12:52:20.5879319Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e753f34d0efc412f.json (deflated 37%) 2025-12-04T12:52:20.5880950Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-3e869ce9df9ec961.json (deflated 37%) 2025-12-04T12:52:20.5882567Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2b82b1f0b5e5f8cd.json (deflated 36%) 2025-12-04T12:52:20.5884176Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a8c407980a31ffc0.json (deflated 37%) 2025-12-04T12:52:20.5885859Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2efe6b4638116b91.json (deflated 36%) 2025-12-04T12:52:20.5887474Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-98c90d11be4a4494.json (deflated 38%) 2025-12-04T12:52:20.5889095Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-4931d5d769ccdcd0.json (deflated 37%) 2025-12-04T12:52:20.5890704Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-05476064036aedd6.json (deflated 37%) 2025-12-04T12:52:20.5892386Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-8e55f782fe295c55.json (deflated 37%) 2025-12-04T12:52:20.5894193Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-fed8717c328843a6.json (deflated 38%) 2025-12-04T12:52:20.5895818Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-de76ea5d9fe707d8.json (deflated 37%) 2025-12-04T12:52:20.5897488Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-0f7e482c95e1e619.json (deflated 37%) 2025-12-04T12:52:20.5899108Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7ab8d1b1bfba2dc7.json (deflated 37%) 2025-12-04T12:52:20.5900729Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-79d029624b0186af.json (deflated 38%) 2025-12-04T12:52:20.5902353Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7ef38c3a629f7d6a.json (deflated 38%) 2025-12-04T12:52:20.5903970Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a3eb91c932561739.json (deflated 36%) 2025-12-04T12:52:20.5905591Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-5c4ed4bc28536a40.json (deflated 36%) 2025-12-04T12:52:20.5907333Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-4d50dcf186234952.json (deflated 37%) 2025-12-04T12:52:20.5908906Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ce507d7407dcfdcd.json (deflated 37%) 2025-12-04T12:52:20.5910478Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-fb440b5504373386.json (deflated 37%) 2025-12-04T12:52:20.5912052Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-170bafadd5b2b85e.json (deflated 38%) 2025-12-04T12:52:20.5913617Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-58170fe4322a80c7.json (deflated 56%) 2025-12-04T12:52:20.5915283Z adding: test/test-reports/python-pytest/distributed.optim.test_zero_redundancy_optimizer/distributed.optim.test_zero_redundancy_optimizer-541994707c39cee5.json (deflated 95%) 2025-12-04T12:52:20.5916768Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08577689f3d858a6.json (deflated 34%) 2025-12-04T12:52:20.5918017Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c743ab120f5be65e.json (deflated 33%) 2025-12-04T12:52:20.5919304Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a323813051995ff.json (deflated 33%) 2025-12-04T12:52:20.5920534Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c255dbdc27d4d77.json (deflated 33%) 2025-12-04T12:52:20.5921788Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9f1202b2ef2d0e8.json (deflated 34%) 2025-12-04T12:52:20.5923041Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8281b785ed89747f.json (deflated 34%) 2025-12-04T12:52:20.5924295Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de1cfac33910faa2.json (deflated 33%) 2025-12-04T12:52:20.5925532Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-864ab6c46117080c.json (deflated 33%) 2025-12-04T12:52:20.5926775Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76784aced4c97984.json (deflated 34%) 2025-12-04T12:52:20.5928033Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-703de7f7dca9caca.json (deflated 33%) 2025-12-04T12:52:20.5929292Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0adaa780135147c4.json (deflated 34%) 2025-12-04T12:52:20.5930583Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9b446c4fb19b7fa9.json (deflated 34%) 2025-12-04T12:52:20.5931839Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a7525f86ad26c33d.json (deflated 33%) 2025-12-04T12:52:20.5933085Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6753a042db8c209.json (deflated 33%) 2025-12-04T12:52:20.5934580Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2349534806fd7876.json (deflated 33%) 2025-12-04T12:52:20.5935845Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-42ee5035490db9e3.json (deflated 33%) 2025-12-04T12:52:20.5937132Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-50ae93e5877e6267.json (deflated 34%) 2025-12-04T12:52:20.5938424Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a89286bdf69beec6.json (deflated 34%) 2025-12-04T12:52:20.5939785Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe0b73bb8411ca0.json (deflated 33%) 2025-12-04T12:52:20.5941067Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b626fd6cfef7d3a.json (deflated 33%) 2025-12-04T12:52:20.5942354Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8330e64091d76cd1.json (deflated 34%) 2025-12-04T12:52:20.5943633Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c23dbc048e48feb8.json (deflated 34%) 2025-12-04T12:52:20.5944920Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-89bb327f63e4e8bb.json (deflated 34%) 2025-12-04T12:52:20.5946280Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9c8a9e0d041cedea.json (deflated 33%) 2025-12-04T12:52:20.5947537Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-42322dfcd604c1a2.json (deflated 47%) 2025-12-04T12:52:20.5948794Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-05269c810dd53a0a.json (deflated 33%) 2025-12-04T12:52:20.5950034Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-496a22975dc71079.json (deflated 33%) 2025-12-04T12:52:20.5951280Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-63ac84cde87e453b.json (deflated 34%) 2025-12-04T12:52:20.5952543Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4f81974b0076c9ff.json (deflated 36%) 2025-12-04T12:52:20.5953788Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6030e8ef08b1e09a.json (deflated 35%) 2025-12-04T12:52:20.5955029Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bd13602843a8b4fd.json (deflated 35%) 2025-12-04T12:52:20.5956280Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9df479069693f235.json (deflated 34%) 2025-12-04T12:52:20.5957517Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4af234e53b06ab6b.json (deflated 34%) 2025-12-04T12:52:20.5958769Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cc6adfb19ea74de2.json (deflated 35%) 2025-12-04T12:52:20.5960083Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f1e4f77d9fcfd90.json (deflated 35%) 2025-12-04T12:52:20.5961342Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-27f8ee33cfcf037f.json (deflated 35%) 2025-12-04T12:52:20.5962638Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cdeff74b74265a35.json (deflated 35%) 2025-12-04T12:52:20.5963885Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a9993b41f082320c.json (deflated 39%) 2025-12-04T12:52:20.5965107Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3652407d28f2c215.json (deflated 34%) 2025-12-04T12:52:20.5966361Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3dba64b9a4678ed.json (deflated 35%) 2025-12-04T12:52:20.5967619Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3a5b171659e4eecd.json (deflated 37%) 2025-12-04T12:52:20.5968879Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c138db9f1cd0e29e.json (deflated 35%) 2025-12-04T12:52:20.5970114Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-113748ca2c5988c5.json (deflated 43%) 2025-12-04T12:52:20.5971359Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9d127482d5c0a15f.json (deflated 36%) 2025-12-04T12:52:20.5972659Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c6be5601f204f6a.json (deflated 34%) 2025-12-04T12:52:20.5974152Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f818599ff0d45c17.json (deflated 35%) 2025-12-04T12:52:20.5975441Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6285dc53d0288723.json (deflated 36%) 2025-12-04T12:52:20.5976714Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-13ada6365eaa3764.json (deflated 34%) 2025-12-04T12:52:20.5978000Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a8eee205d4f58a65.json (deflated 35%) 2025-12-04T12:52:20.5979467Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-88abaaf5c04af9c6.json (deflated 35%) 2025-12-04T12:52:20.5980771Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a90decd7629fa5.json (deflated 32%) 2025-12-04T12:52:20.5982050Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9d96b951f094adb8.json (deflated 32%) 2025-12-04T12:52:20.5983338Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-21b096edd5ee1b6a.json (deflated 32%) 2025-12-04T12:52:20.5984666Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a0b61d4860845bc5.json (deflated 33%) 2025-12-04T12:52:20.5985957Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0dbdae2b70ece5b8.json (deflated 33%) 2025-12-04T12:52:20.5987236Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-af13cf2684481c71.json (deflated 33%) 2025-12-04T12:52:20.5988520Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d92a42ccc718652.json (deflated 31%) 2025-12-04T12:52:20.5989801Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3c2b86f8fd4d9656.json (deflated 33%) 2025-12-04T12:52:20.5991275Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5a0ef3ecf28b8b71.json (deflated 33%) 2025-12-04T12:52:20.5992482Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-dde6a545db9a9d8a.json (deflated 32%) 2025-12-04T12:52:20.5993691Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5facf21e5ab1561.json (deflated 33%) 2025-12-04T12:52:20.5994952Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8fe4943e76f9400e.json (deflated 32%) 2025-12-04T12:52:20.5996161Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1c6197817aebc3a0.json (deflated 33%) 2025-12-04T12:52:20.5997365Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aec658dbe3b6bd3e.json (deflated 32%) 2025-12-04T12:52:20.5998573Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eda16941220f320e.json (deflated 33%) 2025-12-04T12:52:20.5999973Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e4305ddff77c0f5f.json (deflated 32%) 2025-12-04T12:52:20.6001223Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0e3a131c8f0d73a7.json (deflated 33%) 2025-12-04T12:52:20.6002453Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1ba7f15a8c05fd39.json (deflated 33%) 2025-12-04T12:52:20.6003706Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-af2c578cafcc8d63.json (deflated 33%) 2025-12-04T12:52:20.6005003Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-578bb605dc4ba552.json (deflated 33%) 2025-12-04T12:52:20.6006255Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b86217cb65e9b710.json (deflated 46%) 2025-12-04T12:52:20.6007482Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f2854c8345d5fc1c.json (deflated 32%) 2025-12-04T12:52:20.6008740Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c653d1bf5ed2c0fa.json (deflated 32%) 2025-12-04T12:52:20.6009988Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-55cd7c7ba73bcdc2.json (deflated 32%) 2025-12-04T12:52:20.6011244Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ffbbe82dbea62111.json (deflated 33%) 2025-12-04T12:52:20.6012505Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-538ab2b49097f28c.json (deflated 33%) 2025-12-04T12:52:20.6013958Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fe265780a95359c3.json (deflated 33%) 2025-12-04T12:52:20.6015245Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-faa48c97f4df83cb.json (deflated 33%) 2025-12-04T12:52:20.6016569Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c0cd36b746cb3623.json (deflated 33%) 2025-12-04T12:52:20.6017848Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3b1171fb862d722e.json (deflated 33%) 2025-12-04T12:52:20.6019123Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-df4dd3eb2cdaa291.json (deflated 33%) 2025-12-04T12:52:20.6020416Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-dd69f8dcdadc9fb4.json (deflated 33%) 2025-12-04T12:52:20.6021707Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f54bdf9dde5bf97e.json (deflated 33%) 2025-12-04T12:52:20.6022998Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-83c9fcb3057380d5.json (deflated 33%) 2025-12-04T12:52:20.6024269Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e826f0e43397861e.json (deflated 33%) 2025-12-04T12:52:20.6025841Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e73673b2d41bd335.json (deflated 33%) 2025-12-04T12:52:20.6026973Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1456794f12812e80.json (deflated 33%) 2025-12-04T12:52:20.6028137Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e4e8c70c34dfdfe1.json (deflated 33%) 2025-12-04T12:52:20.6029264Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-be166d073ca55795.json (deflated 33%) 2025-12-04T12:52:20.6030415Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c49bebe8dd8e46f5.json (deflated 33%) 2025-12-04T12:52:20.6031570Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c0f82bf827258b5a.json (deflated 33%) 2025-12-04T12:52:20.6032716Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-983f78b8ab7f02b9.json (deflated 33%) 2025-12-04T12:52:20.6033843Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1ceb5c06515b90e6.json (deflated 33%) 2025-12-04T12:52:20.6034988Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8871c4d5afa11da1.json (deflated 33%) 2025-12-04T12:52:20.6036193Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-005dd80a1d165bd6.json (deflated 33%) 2025-12-04T12:52:20.6037340Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b8dfb4b667064c84.json (deflated 34%) 2025-12-04T12:52:20.6038468Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-279f10471d7d2185.json (deflated 33%) 2025-12-04T12:52:20.6039614Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2732174555722247.json (deflated 33%) 2025-12-04T12:52:20.6040758Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-39eff7f371737bec.json (deflated 33%) 2025-12-04T12:52:20.6041910Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1bd2ca8c9ccb5d1d.json (deflated 33%) 2025-12-04T12:52:20.6043050Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6de6dc4ebcbeafe7.json (deflated 33%) 2025-12-04T12:52:20.6044214Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef1db87d402e61e.json (deflated 33%) 2025-12-04T12:52:20.6045359Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-07b0bb118974fac5.json (deflated 33%) 2025-12-04T12:52:20.6046520Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b45dd1b51a454859.json (deflated 33%) 2025-12-04T12:52:20.6047667Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c5c7f8af22555084.json (deflated 33%) 2025-12-04T12:52:20.6048797Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d5f0a9727dbb8c1.json (deflated 33%) 2025-12-04T12:52:20.6049949Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-efc5df0b6f603c8e.json (deflated 34%) 2025-12-04T12:52:20.6051097Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3947ad663b8a0a5.json (deflated 35%) 2025-12-04T12:52:20.6052235Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9a8e98b254282d7c.json (deflated 35%) 2025-12-04T12:52:20.6053417Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6f2a37a539841f4a.json (deflated 35%) 2025-12-04T12:52:20.6054837Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-11da4acc57c01e3e.json (deflated 34%) 2025-12-04T12:52:20.6056124Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-efa00c53d9ffafd0.json (deflated 32%) 2025-12-04T12:52:20.6057446Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c139de0f468bfbd0.json (deflated 32%) 2025-12-04T12:52:20.6058716Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-508c49e985b343e0.json (deflated 35%) 2025-12-04T12:52:20.6059997Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-85c75a6ef966eda2.json (deflated 32%) 2025-12-04T12:52:20.6061285Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3b5821ebfa1d2b9.json (deflated 32%) 2025-12-04T12:52:20.6062567Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-78c3c6f081c06510.json (deflated 32%) 2025-12-04T12:52:20.6063845Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eceb8537f545c10b.json (deflated 33%) 2025-12-04T12:52:20.6065136Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-997f8b36df0838da.json (deflated 33%) 2025-12-04T12:52:20.6066573Z adding: test/test-reports/python-pytest/distributed.test_launcher/distributed.test_launcher-ab711efd5b5eae9c.json (deflated 39%) 2025-12-04T12:52:20.6067746Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4402e4ca07679d5e.json (deflated 33%) 2025-12-04T12:52:20.6068823Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-59586d5fa8d9df00.json (deflated 33%) 2025-12-04T12:52:20.6069920Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-904d5bf6ccd1c7aa.json (deflated 33%) 2025-12-04T12:52:20.6071012Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c074807310bb3c83.json (deflated 39%) 2025-12-04T12:52:20.6072091Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8f207b69f7a673c5.json (deflated 33%) 2025-12-04T12:52:20.6073170Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cb7ce4d5a847e19b.json (deflated 33%) 2025-12-04T12:52:20.6074258Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-52257f6acad204d5.json (deflated 33%) 2025-12-04T12:52:20.6075341Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b16442ffcab7dd38.json (deflated 44%) 2025-12-04T12:52:20.6076424Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a12f788b611f7140.json (deflated 43%) 2025-12-04T12:52:20.6077532Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-dfaa32f264045766.json (deflated 44%) 2025-12-04T12:52:20.6078734Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8da17e1f6a078343.json (deflated 44%) 2025-12-04T12:52:20.6080087Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-e8525e9dd27a79c3.json (deflated 33%) 2025-12-04T12:52:20.6081308Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-6f41a4ce4013a7c2.json (deflated 33%) 2025-12-04T12:52:20.6082520Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0470d4ac72d7a50e.json (deflated 34%) 2025-12-04T12:52:20.6083750Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3a36dde009a45dc5.json (deflated 33%) 2025-12-04T12:52:20.6084979Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-df81badd9797f785.json (deflated 33%) 2025-12-04T12:52:20.6086202Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-350d6caf6618d2b5.json (deflated 33%) 2025-12-04T12:52:20.6087412Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9a9af3abc1b0b41d.json (deflated 33%) 2025-12-04T12:52:20.6088696Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-619ffbde41a10c5d.json (deflated 33%) 2025-12-04T12:52:20.6089926Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c5cf03e47a405c4b.json (deflated 34%) 2025-12-04T12:52:20.6091160Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-af6e2b8d803b9c4f.json (deflated 33%) 2025-12-04T12:52:20.6092368Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97c2051ccee23a9b.json (deflated 33%) 2025-12-04T12:52:20.6093517Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-482d04b678af0ece.json (deflated 33%) 2025-12-04T12:52:20.6094888Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-10499bf9f759075b.json (deflated 33%) 2025-12-04T12:52:20.6096111Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cccae4cdf350788c.json (deflated 33%) 2025-12-04T12:52:20.6097339Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9241ea60ff5c054f.json (deflated 34%) 2025-12-04T12:52:20.6098631Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c77cdf590d6c4d53.json (deflated 34%) 2025-12-04T12:52:20.6099856Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9d86817452e27e08.json (deflated 33%) 2025-12-04T12:52:20.6101071Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4b2a51af148732c1.json (deflated 32%) 2025-12-04T12:52:20.6102294Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-36cc6892ce1d13b2.json (deflated 32%) 2025-12-04T12:52:20.6103509Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f3e053be23766a80.json (deflated 32%) 2025-12-04T12:52:20.6104738Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c4831c850757caf9.json (deflated 33%) 2025-12-04T12:52:20.6106060Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-15d93d0dff93e000.json (deflated 33%) 2025-12-04T12:52:20.6107152Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97c506133bdf82e2.json (deflated 44%) 2025-12-04T12:52:20.6108224Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-760dbf0b3a7076aa.json (deflated 43%) 2025-12-04T12:52:20.6109307Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0c82075f9025f767.json (deflated 43%) 2025-12-04T12:52:20.6110416Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3e48901c9a873f45.json (deflated 43%) 2025-12-04T12:52:20.6111493Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7a3e24041d2ef943.json (deflated 32%) 2025-12-04T12:52:20.6112562Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-04f97f132e861e46.json (deflated 33%) 2025-12-04T12:52:20.6113640Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-de7756f9c1641da6.json (deflated 33%) 2025-12-04T12:52:20.6114733Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1bca8b988eb41bac.json (deflated 33%) 2025-12-04T12:52:20.6115813Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-286102c30fda404f.json (deflated 32%) 2025-12-04T12:52:20.6116884Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-bedd718db983bac0.json (deflated 33%) 2025-12-04T12:52:20.6117971Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-eefe267c1e87f355.json (deflated 33%) 2025-12-04T12:52:20.6119057Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-75d254d8d2b940a0.json (deflated 38%) 2025-12-04T12:52:20.6120173Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d29ba7e8ccb2ecb1.json (deflated 32%) 2025-12-04T12:52:20.6121252Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f94b0a13c28491ca.json (deflated 33%) 2025-12-04T12:52:20.6122338Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d359118932c9b995.json (deflated 33%) 2025-12-04T12:52:20.6123425Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f756c8323a1d09e8.json (deflated 34%) 2025-12-04T12:52:20.6124511Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1240aa89fcaf1417.json (deflated 34%) 2025-12-04T12:52:20.6125594Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d2b8c5d98b2db0d3.json (deflated 33%) 2025-12-04T12:52:20.6126691Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f04571bca3f6577a.json (deflated 33%) 2025-12-04T12:52:20.6127782Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-62a8c36072d028e3.json (deflated 44%) 2025-12-04T12:52:20.6128899Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-085d0338122bdd88.json (deflated 43%) 2025-12-04T12:52:20.6129965Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3aff210f96c86539.json (deflated 43%) 2025-12-04T12:52:20.6131050Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-50126680f72685fd.json (deflated 43%) 2025-12-04T12:52:20.6132312Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-265b5fcb7c5f4add.json (deflated 32%) 2025-12-04T12:52:20.6133469Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-658bdaf47e5d9fc0.json (deflated 32%) 2025-12-04T12:52:20.6134889Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9359cc0c923fd357.json (deflated 33%) 2025-12-04T12:52:20.6136119Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8f02a426ff307186.json (deflated 34%) 2025-12-04T12:52:20.6137334Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-724b43aaa3e86430.json (deflated 33%) 2025-12-04T12:52:20.6138557Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-849142fee7d3fe7a.json (deflated 33%) 2025-12-04T12:52:20.6139792Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-40597616ec98a508.json (deflated 32%) 2025-12-04T12:52:20.6141023Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8de5caa0f44ab195.json (deflated 33%) 2025-12-04T12:52:20.6142254Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-70a2189ada91e7b4.json (deflated 32%) 2025-12-04T12:52:20.6143478Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c9872634fae2a2a2.json (deflated 31%) 2025-12-04T12:52:20.6144684Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a2053711ae870746.json (deflated 32%) 2025-12-04T12:52:20.6146118Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4ef2fc9fec34c264.json (deflated 38%) 2025-12-04T12:52:20.6147220Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b4a7a6fe6b411ab3.json (deflated 32%) 2025-12-04T12:52:20.6148310Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3a29202ead173617.json (deflated 32%) 2025-12-04T12:52:20.6149383Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a61155d6f938b2cc.json (deflated 32%) 2025-12-04T12:52:20.6150493Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9b41f59ee0cfeb75.json (deflated 33%) 2025-12-04T12:52:20.6151582Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-307094c901db62b6.json (deflated 33%) 2025-12-04T12:52:20.6152670Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c3332bb3687882a6.json (deflated 32%) 2025-12-04T12:52:20.6153737Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-422a006275d6f6d2.json (deflated 32%) 2025-12-04T12:52:20.6154821Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b842f6182997ffac.json (deflated 32%) 2025-12-04T12:52:20.6155913Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b78401240c2392a0.json (deflated 32%) 2025-12-04T12:52:20.6157002Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-05d0a7320ac2f2e5.json (deflated 32%) 2025-12-04T12:52:20.6158082Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-56eae3d52dfef9e0.json (deflated 32%) 2025-12-04T12:52:20.6159223Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9598d85b65a1dd25.json (deflated 32%) 2025-12-04T12:52:20.6160309Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9433dc4d80f4e3fb.json (deflated 32%) 2025-12-04T12:52:20.6161400Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-fbcb32de5a4aaa3b.json (deflated 33%) 2025-12-04T12:52:20.6162484Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c2ec29ec8ed5fa00.json (deflated 34%) 2025-12-04T12:52:20.6163573Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7fbbe4d1eb982186.json (deflated 32%) 2025-12-04T12:52:20.6164661Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-fd140e21219ecfa7.json (deflated 32%) 2025-12-04T12:52:20.6165751Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-db87db829f00dbc2.json (deflated 32%) 2025-12-04T12:52:20.6166822Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0645df0da2606eed.json (deflated 32%) 2025-12-04T12:52:20.6167898Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-5be93906577b570a.json (deflated 33%) 2025-12-04T12:52:20.6168984Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cb158e2a6356a16b.json (deflated 32%) 2025-12-04T12:52:20.6170094Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1efba08034268a13.json (deflated 33%) 2025-12-04T12:52:20.6171180Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1cfd040a8029b228.json (deflated 33%) 2025-12-04T12:52:20.6172253Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-26b3b1f841eed644.json (deflated 32%) 2025-12-04T12:52:20.6173393Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8363e71100168a00.json (deflated 33%) 2025-12-04T12:52:20.6174746Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-449df693536afa26.json (deflated 32%) 2025-12-04T12:52:20.6175972Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a65d4b001bf04cc5.json (deflated 33%) 2025-12-04T12:52:20.6177185Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a841b70017fda049.json (deflated 32%) 2025-12-04T12:52:20.6178408Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4cd5969ebb7a8971.json (deflated 33%) 2025-12-04T12:52:20.6179810Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8be9136d4926e94c.json (deflated 34%) 2025-12-04T12:52:20.6181065Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-804b0113d0e9f4eb.json (deflated 34%) 2025-12-04T12:52:20.6182282Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-03bdd58db5584705.json (deflated 33%) 2025-12-04T12:52:20.6183506Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ebfe437a9c6a9ad8.json (deflated 33%) 2025-12-04T12:52:20.6184729Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a1970e4a95b6fcaa.json (deflated 32%) 2025-12-04T12:52:20.6185955Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-e7b46a0191ac24db.json (deflated 33%) 2025-12-04T12:52:20.6187170Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7c2efcfcaf566fcf.json (deflated 32%) 2025-12-04T12:52:20.6188397Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-98f57491a6bbb62c.json (deflated 32%) 2025-12-04T12:52:20.6189622Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4dd659a2a136b6f5.json (deflated 32%) 2025-12-04T12:52:20.6191095Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ed9bd2c70267a528.json (deflated 32%) 2025-12-04T12:52:20.6192171Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-2244d9322e930674.json (deflated 34%) 2025-12-04T12:52:20.6193249Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c9423adcd896df81.json (deflated 33%) 2025-12-04T12:52:20.6194334Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-627f0929637f665a.json (deflated 33%) 2025-12-04T12:52:20.6195425Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c04d647e93aaefae.json (deflated 33%) 2025-12-04T12:52:20.6196506Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-91ec2ebad1950a1b.json (deflated 33%) 2025-12-04T12:52:20.6197597Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-70b9e0e0ebbe8c78.json (deflated 33%) 2025-12-04T12:52:20.6198684Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-64a3329e5489e4a1.json (deflated 33%) 2025-12-04T12:52:20.6199767Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-286648de75ab2791.json (deflated 34%) 2025-12-04T12:52:20.6200865Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-aba6b9e907975cf2.json (deflated 34%) 2025-12-04T12:52:20.6201950Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9858e278543d5c8a.json (deflated 33%) 2025-12-04T12:52:20.6203026Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7b426550a7173d4d.json (deflated 34%) 2025-12-04T12:52:20.6204109Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-687616f36cc07fc7.json (deflated 34%) 2025-12-04T12:52:20.6205178Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97db8e5668905dba.json (deflated 34%) 2025-12-04T12:52:20.6206262Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ad88807c8d637ecf.json (deflated 34%) 2025-12-04T12:52:20.6207349Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b4bc3ece8958b620.json (deflated 33%) 2025-12-04T12:52:20.6208437Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-dad34641aa2ec6a8.json (deflated 34%) 2025-12-04T12:52:20.6209515Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4581fd35a1a9c062.json (deflated 32%) 2025-12-04T12:52:20.6210603Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3bb2e8b2b3b8f504.json (deflated 32%) 2025-12-04T12:52:20.6211714Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-21612ac6bc46612d.json (deflated 34%) 2025-12-04T12:52:20.6212831Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-424c7796b1d4da37.json (deflated 34%) 2025-12-04T12:52:20.6214232Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7bb4f0d3928e2ed2.json (deflated 43%) 2025-12-04T12:52:20.6215518Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-33318bd0fe5ba50d.json (deflated 33%) 2025-12-04T12:52:20.6216801Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-40835271667093d0.json (deflated 33%) 2025-12-04T12:52:20.6218077Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c5cf88214d708090.json (deflated 33%) 2025-12-04T12:52:20.6219351Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fe7d871fae3e5a4d.json (deflated 33%) 2025-12-04T12:52:20.6220698Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2f486ca4ce8f2e5.json (deflated 33%) 2025-12-04T12:52:20.6221983Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2514e248fb19a6f7.json (deflated 33%) 2025-12-04T12:52:20.6223258Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5fddd299999d2001.json (deflated 35%) 2025-12-04T12:52:20.6224536Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-526fccf11a728f1b.json (deflated 33%) 2025-12-04T12:52:20.6225929Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ae02a87d7dee25c6.json (deflated 35%) 2025-12-04T12:52:20.6227188Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-92c41cc069cc4d37.json (deflated 33%) 2025-12-04T12:52:20.6228325Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8ae44307030d30dc.json (deflated 57%) 2025-12-04T12:52:20.6229459Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9417cbc175c56634.json (deflated 33%) 2025-12-04T12:52:20.6230600Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-14e447da1762fe9d.json (deflated 33%) 2025-12-04T12:52:20.6231776Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f1da8f898c8da1fd.json (deflated 34%) 2025-12-04T12:52:20.6232929Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-60a2acbc27c8df8e.json (deflated 33%) 2025-12-04T12:52:20.6234075Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9cce25ecd2936a9e.json (deflated 33%) 2025-12-04T12:52:20.6235209Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7981bf6fc2476012.json (deflated 33%) 2025-12-04T12:52:20.6236346Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-68308ed30b7c8249.json (deflated 33%) 2025-12-04T12:52:20.6237488Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7d82f8c0c3f71993.json (deflated 35%) 2025-12-04T12:52:20.6238624Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bc89766ea636312a.json (deflated 35%) 2025-12-04T12:52:20.6239742Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-14132ee63704b359.json (deflated 36%) 2025-12-04T12:52:20.6240882Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d030ce3f22be0dac.json (deflated 35%) 2025-12-04T12:52:20.6242046Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-08e08a8b7a3c9688.json (deflated 34%) 2025-12-04T12:52:20.6243187Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-04413b57dd0bd1bb.json (deflated 34%) 2025-12-04T12:52:20.6244318Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ede7c35169d447e5.json (deflated 36%) 2025-12-04T12:52:20.6245462Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-03d41d77810a93c2.json (deflated 35%) 2025-12-04T12:52:20.6246605Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e5d454fc87664e79.json (deflated 35%) 2025-12-04T12:52:20.6247749Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9b5d1be0cdc61898.json (deflated 35%) 2025-12-04T12:52:20.6248883Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cb65338c3c68b015.json (deflated 35%) 2025-12-04T12:52:20.6250032Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ea9d6160bdcb14ea.json (deflated 35%) 2025-12-04T12:52:20.6251220Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54b6e103bc500b82.json (deflated 35%) 2025-12-04T12:52:20.6252375Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-43cbd90ab1e74433.json (deflated 35%) 2025-12-04T12:52:20.6253587Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e1a5076677cc9040.json (deflated 34%) 2025-12-04T12:52:20.6255036Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5b874423f18e9e6f.json (deflated 35%) 2025-12-04T12:52:20.6256324Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-42a99cc8097c27cf.json (deflated 36%) 2025-12-04T12:52:20.6257617Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff575ff252e1ccc1.json (deflated 35%) 2025-12-04T12:52:20.6258900Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8f0aac570bbe3c22.json (deflated 34%) 2025-12-04T12:52:20.6260191Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e46e2d43687e8c9.json (deflated 36%) 2025-12-04T12:52:20.6261469Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8983151d5ee422e9.json (deflated 43%) 2025-12-04T12:52:20.6262783Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-21007923fa28eb94.json (deflated 37%) 2025-12-04T12:52:20.6264076Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6d199e27a3f20ba.json (deflated 37%) 2025-12-04T12:52:20.6265357Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-333908f1f4e63432.json (deflated 36%) 2025-12-04T12:52:20.6266674Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-10f6223b6a5799e8.json (deflated 35%) 2025-12-04T12:52:20.6267820Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0488ec6f0d084de4.json (deflated 35%) 2025-12-04T12:52:20.6268974Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d0fe3ffcefb63fed.json (deflated 35%) 2025-12-04T12:52:20.6270118Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-aede6c2e8b0a576d.json (deflated 33%) 2025-12-04T12:52:20.6271259Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a402d736ec805725.json (deflated 34%) 2025-12-04T12:52:20.6272423Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3d885be53a1fd8f7.json (deflated 33%) 2025-12-04T12:52:20.6273564Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e68bb841a642474c.json (deflated 33%) 2025-12-04T12:52:20.6274690Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f1f78f076f606689.json (deflated 44%) 2025-12-04T12:52:20.6275833Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-31755bde92c246ad.json (deflated 35%) 2025-12-04T12:52:20.6276972Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-954c6b44d604c3c5.json (deflated 34%) 2025-12-04T12:52:20.6278108Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-edecd7ac68a78e0c.json (deflated 34%) 2025-12-04T12:52:20.6279595Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1b2e7a904a2bde2c.json (deflated 35%) 2025-12-04T12:52:20.6280891Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8e8e63e24b22fdf3.json (deflated 35%) 2025-12-04T12:52:20.6282280Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cac81d05bb378553.json (deflated 35%) 2025-12-04T12:52:20.6283570Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-44879a601c328a3c.json (deflated 35%) 2025-12-04T12:52:20.6284839Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fbb8b0d505262428.json (deflated 35%) 2025-12-04T12:52:20.6286135Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0417ada3fc299914.json (deflated 34%) 2025-12-04T12:52:20.6287410Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6ec92773f760b446.json (deflated 31%) 2025-12-04T12:52:20.6288698Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-27748f3e1c06d895.json (deflated 48%) 2025-12-04T12:52:20.6289969Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5b8215da1349e52d.json (deflated 32%) 2025-12-04T12:52:20.6291345Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-97e205e1ddd96df1.json (deflated 34%) 2025-12-04T12:52:20.6292602Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8ef86c99e3cf3225.json (deflated 35%) 2025-12-04T12:52:20.6294013Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6a001d92e92403d5.json (deflated 35%) 2025-12-04T12:52:20.6295282Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1548f33eb44681e7.json (deflated 34%) 2025-12-04T12:52:20.6296567Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-85e41dba7cec0336.json (deflated 33%) 2025-12-04T12:52:20.6297861Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c9aa6140346d940e.json (deflated 34%) 2025-12-04T12:52:20.6299150Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-677b6de586c48a10.json (deflated 34%) 2025-12-04T12:52:20.6300433Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-129a903d843c6506.json (deflated 34%) 2025-12-04T12:52:20.6301707Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a8e0d0b35114e966.json (deflated 35%) 2025-12-04T12:52:20.6302988Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-00b8787048f81655.json (deflated 34%) 2025-12-04T12:52:20.6304273Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9749a48df2369fa2.json (deflated 34%) 2025-12-04T12:52:20.6305712Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7b61630e604191fa.json (deflated 34%) 2025-12-04T12:52:20.6306980Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-babd79b2628f20c6.json (deflated 32%) 2025-12-04T12:52:20.6308122Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c321a873e0094d80.json (deflated 33%) 2025-12-04T12:52:20.6309265Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9d5bf7025c6dd7a3.json (deflated 33%) 2025-12-04T12:52:20.6310409Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6fdafbad56912bb4.json (deflated 33%) 2025-12-04T12:52:20.6311546Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fac34b62cd6cb996.json (deflated 33%) 2025-12-04T12:52:20.6312685Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-18c9f847c7541135.json (deflated 33%) 2025-12-04T12:52:20.6313881Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-aa0ccd92b1be664c.json (deflated 33%) 2025-12-04T12:52:20.6315025Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-42a2588e5ba30eef.json (deflated 33%) 2025-12-04T12:52:20.6316154Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6de735cb6d76f6db.json (deflated 33%) 2025-12-04T12:52:20.6317301Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b9c0c81401e31d8c.json (deflated 32%) 2025-12-04T12:52:20.6318441Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-554a092032e2d568.json (deflated 33%) 2025-12-04T12:52:20.6319584Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1bdd7c0fac046180.json (deflated 35%) 2025-12-04T12:52:20.6320718Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b30b289e682ac666.json (deflated 35%) 2025-12-04T12:52:20.6321858Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-db0ee76b53b2ddf3.json (deflated 34%) 2025-12-04T12:52:20.6323010Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4fbe3fc8cf0cb3d0.json (deflated 42%) 2025-12-04T12:52:20.6324282Z adding: test/test-reports/python-pytest/distributed.elastic.events.lib_test/distributed.elastic.events.lib_test-07a790705f8742f5.json (deflated 87%) 2025-12-04T12:52:20.6325636Z adding: test/test-reports/python-pytest/distributed.elastic.metrics.api_test/distributed.elastic.metrics.api_test-089696776a609d56.json (deflated 75%) 2025-12-04T12:52:20.6327088Z adding: test/test-reports/python-pytest/distributed.elastic.timer.local_timer_example/distributed.elastic.timer.local_timer_example-2bef7f019a87a08a.json (deflated 60%) 2025-12-04T12:52:20.6328605Z adding: test/test-reports/python-pytest/distributed.elastic.timer.local_timer_test/distributed.elastic.timer.local_timer_test-7292bedd4140d1cb.json (deflated 90%) 2025-12-04T12:52:20.6330097Z adding: test/test-reports/python-pytest/distributed.elastic.utils.distributed_test/distributed.elastic.utils.distributed_test-37b4dd92e3796470.json (deflated 88%) 2025-12-04T12:52:20.6331560Z adding: test/test-reports/python-pytest/distributed.elastic.utils.logging_test/distributed.elastic.utils.logging_test-ad00506eaa0f6b8e.json (deflated 64%) 2025-12-04T12:52:20.6332936Z adding: test/test-reports/python-pytest/distributed.elastic.utils.util_test/distributed.elastic.utils.util_test-06e2f9e323fc3569.json (deflated 90%) 2025-12-04T12:52:20.6334562Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-30657b6825f5a9b9.json (stored 0%) 2025-12-04T12:52:20.6336054Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2659630a9052ba32.json (stored 0%) 2025-12-04T12:52:20.6337520Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35db7c52cb4e4963.json (stored 0%) 2025-12-04T12:52:20.6338967Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d045df6f00832674.json (stored 0%) 2025-12-04T12:52:20.6340437Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2e7babf23a98fec.json (stored 0%) 2025-12-04T12:52:20.6341901Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce1d4513008f30b5.json (stored 0%) 2025-12-04T12:52:20.6343375Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-386d51a44811a37c.json (stored 0%) 2025-12-04T12:52:20.6344884Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a4c267fd423ef4fb.json (stored 0%) 2025-12-04T12:52:20.6346399Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422923c4a3ff9000.json (stored 0%) 2025-12-04T12:52:20.6347707Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-869317da10a8a2cd.json (stored 0%) 2025-12-04T12:52:20.6349024Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-02900a63a52fefe8.json (stored 0%) 2025-12-04T12:52:20.6350331Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-20776a9c6c63a20c.json (stored 0%) 2025-12-04T12:52:20.6351634Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9bfba6a18599d81e.json (stored 0%) 2025-12-04T12:52:20.6352958Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89a2aaebca19d4e0.json (stored 0%) 2025-12-04T12:52:20.6354273Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7bfcf341dfacd6a.json (stored 0%) 2025-12-04T12:52:20.6355593Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cefb54cda6fbbd60.json (stored 0%) 2025-12-04T12:52:20.6356935Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1caa1cefc24b5566.json (stored 0%) 2025-12-04T12:52:20.6358235Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6f460f0a05f7b70.json (stored 0%) 2025-12-04T12:52:20.6359564Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3757b46df7c65698.json (deflated 44%) 2025-12-04T12:52:20.6360909Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-222eea109d5ddc53.json (deflated 43%) 2025-12-04T12:52:20.6362243Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4fbe4fa34b470fa8.json (deflated 44%) 2025-12-04T12:52:20.6363569Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37a4634b8fad8ea.json (deflated 43%) 2025-12-04T12:52:20.6364896Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4758d32b2c1097e5.json (deflated 43%) 2025-12-04T12:52:20.6366249Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f048625fe3cca682.json (deflated 44%) 2025-12-04T12:52:20.6367580Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4dbf8960b326e96e.json (deflated 44%) 2025-12-04T12:52:20.6368903Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43e82ed2dcb87d3f.json (deflated 44%) 2025-12-04T12:52:20.6370224Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fb482cd3d23f5d7.json (deflated 43%) 2025-12-04T12:52:20.6371550Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4860aca019c3141.json (deflated 44%) 2025-12-04T12:52:20.6372864Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065ee64125ced44a.json (deflated 42%) 2025-12-04T12:52:20.6374500Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7bccf121b18dac1f.json (deflated 43%) 2025-12-04T12:52:20.6376053Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0b49c26d87da987.json (deflated 43%) 2025-12-04T12:52:20.6377555Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3693e8a7c975c90a.json (deflated 36%) 2025-12-04T12:52:20.6379234Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fe1b63bffb2c6a3e.json (deflated 37%) 2025-12-04T12:52:20.6380758Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c99c8b7e71138e8.json (deflated 36%) 2025-12-04T12:52:20.6382240Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-989a5c6b3d6a9965.json (deflated 37%) 2025-12-04T12:52:20.6383746Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6823ac8a1907d169.json (deflated 36%) 2025-12-04T12:52:20.6385248Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ac5bb86a9b9f7fe.json (deflated 37%) 2025-12-04T12:52:20.6386737Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32f1715c008a3066.json (deflated 38%) 2025-12-04T12:52:20.6388262Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70519aa5531a3696.json (deflated 37%) 2025-12-04T12:52:20.6389746Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5dfee2b8fb7ba7cc.json (deflated 36%) 2025-12-04T12:52:20.6391246Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3caf33f6f4116297.json (deflated 44%) 2025-12-04T12:52:20.6392576Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-087aca976836f603.json (deflated 38%) 2025-12-04T12:52:20.6393906Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9841b7f077764a91.json (deflated 36%) 2025-12-04T12:52:20.6395226Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c090e23a7f2fffdb.json (deflated 36%) 2025-12-04T12:52:20.6396560Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5459ba8e137555e.json (deflated 36%) 2025-12-04T12:52:20.6397878Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-80a7f4a66302355b.json (deflated 35%) 2025-12-04T12:52:20.6399412Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2d6cbfeff36754e.json (deflated 45%) 2025-12-04T12:52:20.6400818Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f874e7939adea357.json (deflated 45%) 2025-12-04T12:52:20.6402224Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7fc3f50bae53c154.json (deflated 38%) 2025-12-04T12:52:20.6403634Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4f854973242c798.json (deflated 36%) 2025-12-04T12:52:20.6405046Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-567d6dba79efc754.json (deflated 57%) 2025-12-04T12:52:20.6406450Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32dae883b8f409b4.json (deflated 38%) 2025-12-04T12:52:20.6407906Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f35b4f146cba39ed.json (deflated 43%) 2025-12-04T12:52:20.6409314Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11d385bda710106b.json (deflated 46%) 2025-12-04T12:52:20.6410725Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16f3fe014f6f2b90.json (deflated 37%) 2025-12-04T12:52:20.6412137Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-314d187f8a4f1527.json (deflated 43%) 2025-12-04T12:52:20.6413603Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-37e0f31b5d0eb001.json (deflated 37%) 2025-12-04T12:52:20.6415263Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e98f2eef47033.json (deflated 37%) 2025-12-04T12:52:20.6416764Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0373e508dc2d66c0.json (deflated 37%) 2025-12-04T12:52:20.6418262Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2844fa97ba5d525c.json (deflated 37%) 2025-12-04T12:52:20.6419795Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f42a82f6cff1f523.json (deflated 57%) 2025-12-04T12:52:20.6421278Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-668dafa0f73e7876.json (deflated 37%) 2025-12-04T12:52:20.6422780Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bb4dfdf3a617cc5.json (deflated 38%) 2025-12-04T12:52:20.6424280Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec275c22b294744.json (deflated 38%) 2025-12-04T12:52:20.6425879Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4d848c3599ac1349.json (deflated 36%) 2025-12-04T12:52:20.6427337Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca68035b34ae5b62.json (deflated 37%) 2025-12-04T12:52:20.6428662Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33f84bea341754fa.json (deflated 37%) 2025-12-04T12:52:20.6429989Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-57896abf4e71b1aa.json (deflated 38%) 2025-12-04T12:52:20.6431370Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ce0111ddde5df94.json (deflated 37%) 2025-12-04T12:52:20.6432708Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-75ed533594a3939c.json (deflated 37%) 2025-12-04T12:52:20.6434033Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58dadb18f5c0dbb.json (deflated 36%) 2025-12-04T12:52:20.6435371Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53f849c69fe671b8.json (deflated 43%) 2025-12-04T12:52:20.6436704Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f8ce5171d9832f1.json (deflated 44%) 2025-12-04T12:52:20.6438053Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-caca2e381afec0ac.json (deflated 37%) 2025-12-04T12:52:20.6439388Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-19e4e102157e6e47.json (deflated 43%) 2025-12-04T12:52:20.6440747Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-426e6a6be0a16d20.json (deflated 44%) 2025-12-04T12:52:20.6442075Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f27afe7768298281.json (deflated 43%) 2025-12-04T12:52:20.6443390Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1742254d60a81949.json (deflated 46%) 2025-12-04T12:52:20.6444718Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-731fd65d812688e2.json (deflated 45%) 2025-12-04T12:52:20.6446036Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb1039971f7213e1.json (deflated 37%) 2025-12-04T12:52:20.6447364Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7b2ab90303eda495.json (deflated 46%) 2025-12-04T12:52:20.6448692Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15ad2d4900971da8.json (deflated 43%) 2025-12-04T12:52:20.6450019Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bcbe99e24fef6d4.json (deflated 45%) 2025-12-04T12:52:20.6451368Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc5956b0c1d4a301.json (deflated 45%) 2025-12-04T12:52:20.6452703Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4b951168e9fd206.json (deflated 43%) 2025-12-04T12:52:20.6454373Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-34586d84d9e8b574.json (deflated 37%) 2025-12-04T12:52:20.6455883Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-346f56e4dab409e9.json (deflated 37%) 2025-12-04T12:52:20.6457383Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86debbf241b646f7.json (deflated 37%) 2025-12-04T12:52:20.6458879Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82deea2afca27b05.json (deflated 37%) 2025-12-04T12:52:20.6460388Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68e170c9c9399262.json (deflated 44%) 2025-12-04T12:52:20.6461875Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cb408a515a3366a.json (deflated 36%) 2025-12-04T12:52:20.6463403Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-753ec666c5362e2c.json (deflated 36%) 2025-12-04T12:52:20.6464898Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6738b863174a943.json (deflated 36%) 2025-12-04T12:52:20.6466539Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0285e1ff7eb488da.json (deflated 37%) 2025-12-04T12:52:20.6467871Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-10c387b6e730ad52.json (deflated 37%) 2025-12-04T12:52:20.6469198Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffaa3570c6b46738.json (deflated 37%) 2025-12-04T12:52:20.6470531Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d71dba8115f24b8.json (deflated 43%) 2025-12-04T12:52:20.6472152Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8b4b6bac082ff37.json (deflated 36%) 2025-12-04T12:52:20.6473494Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2c322ea7f3ebbd2.json (deflated 44%) 2025-12-04T12:52:20.6474825Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f03e83663e9a4601.json (deflated 37%) 2025-12-04T12:52:20.6476162Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b48b6a282b4704f.json (deflated 43%) 2025-12-04T12:52:20.6477488Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cfe16ef8c24bdf2.json (deflated 43%) 2025-12-04T12:52:20.6478957Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53f3ffdc1525a0fa.json (deflated 38%) 2025-12-04T12:52:20.6480624Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4005fef94f6d8aae.json (deflated 43%) 2025-12-04T12:52:20.6482119Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6e8960fc72004342.json (deflated 37%) 2025-12-04T12:52:20.6483710Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2756d0ca7e2d02f0.json (deflated 46%) 2025-12-04T12:52:20.6497329Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-badcb6057cbf1c54.json (deflated 45%) 2025-12-04T12:52:20.6498865Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95a798b363d84da6.json (deflated 37%) 2025-12-04T12:52:20.6500383Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e0a914bf0d30c722.json (deflated 44%) 2025-12-04T12:52:20.6501894Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0504f34b27fc1bbf.json (deflated 43%) 2025-12-04T12:52:20.6503386Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-870b2179518ff22e.json (deflated 44%) 2025-12-04T12:52:20.6504889Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-555ef90b62bf7ebe.json (deflated 43%) 2025-12-04T12:52:20.6506599Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef83d3f03358d4cd.json (deflated 43%) 2025-12-04T12:52:20.6508237Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-782ee14ea9598dfa.json (deflated 44%) 2025-12-04T12:52:20.6509575Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-206e0f4991cecb09.json (deflated 44%) 2025-12-04T12:52:20.6510911Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1dfcc212a7b4183f.json (deflated 44%) 2025-12-04T12:52:20.6512257Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-58fc1504b82abecc.json (deflated 43%) 2025-12-04T12:52:20.6513602Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bfe9fe1a7f1a6c.json (deflated 44%) 2025-12-04T12:52:20.6514949Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-442ba0fa5d15173c.json (deflated 42%) 2025-12-04T12:52:20.6516360Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-316e66014d1168cc.json (deflated 43%) 2025-12-04T12:52:20.6517694Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-055618da6a57a7c3.json (deflated 43%) 2025-12-04T12:52:20.6519029Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59c00404add2d546.json (deflated 36%) 2025-12-04T12:52:20.6520365Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4cf6377210f7d08.json (deflated 37%) 2025-12-04T12:52:20.6521699Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd879e6321523d4a.json (deflated 37%) 2025-12-04T12:52:20.6523041Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11a49a1cb92bd713.json (deflated 37%) 2025-12-04T12:52:20.6524378Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4d5ef940e3b8d7d9.json (deflated 36%) 2025-12-04T12:52:20.6525712Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8d88460db673001.json (deflated 37%) 2025-12-04T12:52:20.6527052Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7312243f5e57fab8.json (deflated 38%) 2025-12-04T12:52:20.6528419Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d24b6142b7eafb02.json (deflated 37%) 2025-12-04T12:52:20.6529759Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ddaf535a31194a8.json (deflated 36%) 2025-12-04T12:52:20.6531097Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1941bd1a84ade60.json (deflated 44%) 2025-12-04T12:52:20.6532440Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b4c97a3c97a6540e.json (deflated 37%) 2025-12-04T12:52:20.6534065Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a4f9a9719cac0e27.json (deflated 37%) 2025-12-04T12:52:20.6535586Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffbbd0582ae0ed08.json (deflated 36%) 2025-12-04T12:52:20.6537105Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8fceb5aff8b3d919.json (deflated 36%) 2025-12-04T12:52:20.6538644Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4991d9e148183f8a.json (deflated 35%) 2025-12-04T12:52:20.6540152Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2f340536cc04891.json (deflated 45%) 2025-12-04T12:52:20.6541648Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d54b5f82a473b96e.json (deflated 45%) 2025-12-04T12:52:20.6543166Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32d70eb5fbea7de4.json (deflated 38%) 2025-12-04T12:52:20.6544686Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-20eb72c43e1aae78.json (deflated 36%) 2025-12-04T12:52:20.6546368Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b3c54de3705157f.json (deflated 57%) 2025-12-04T12:52:20.6547695Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-708887f203c7efd9.json (deflated 38%) 2025-12-04T12:52:20.6549080Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-308288814df1046f.json (deflated 43%) 2025-12-04T12:52:20.6550408Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a9b36875380651b.json (deflated 46%) 2025-12-04T12:52:20.6551745Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b640778d73caa6a8.json (deflated 37%) 2025-12-04T12:52:20.6553079Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb7fed148c64c9a0.json (deflated 43%) 2025-12-04T12:52:20.6554412Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-73e1340249d39849.json (deflated 37%) 2025-12-04T12:52:20.6555756Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b53b6883ca9eced.json (deflated 37%) 2025-12-04T12:52:20.6557105Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-588a66c8cc282acc.json (deflated 37%) 2025-12-04T12:52:20.6558440Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e2b4e294105fc8d.json (deflated 37%) 2025-12-04T12:52:20.6559799Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1876d2067da9cba3.json (deflated 57%) 2025-12-04T12:52:20.6561132Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c0f841dc2b4efb2.json (deflated 36%) 2025-12-04T12:52:20.6562472Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a51aa631f2d9c9e.json (deflated 37%) 2025-12-04T12:52:20.6563811Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-36fdf430bc89743e.json (deflated 38%) 2025-12-04T12:52:20.6565137Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e97389ff3c3fed34.json (deflated 37%) 2025-12-04T12:52:20.6566466Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7431cb10fe427793.json (deflated 37%) 2025-12-04T12:52:20.6567797Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300aa7c2334a8d69.json (deflated 37%) 2025-12-04T12:52:20.6569154Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59ebecc4d72f0d7e.json (deflated 37%) 2025-12-04T12:52:20.6570501Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cd9ca1fa3ae4b5b.json (deflated 37%) 2025-12-04T12:52:20.6571830Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f733a13fa18eefb.json (deflated 37%) 2025-12-04T12:52:20.6573171Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ba067e3c1dbc48f6.json (deflated 36%) 2025-12-04T12:52:20.6574860Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5982d44c9781d305.json (deflated 43%) 2025-12-04T12:52:20.6576377Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d3cdd5c6558c74f8.json (deflated 44%) 2025-12-04T12:52:20.6577881Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9abbb37614212e4e.json (deflated 37%) 2025-12-04T12:52:20.6579669Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0b61128e53994e8.json (deflated 43%) 2025-12-04T12:52:20.6581175Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-efe3817fcb01a456.json (deflated 44%) 2025-12-04T12:52:20.6582673Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-406d537981e1a2b6.json (deflated 43%) 2025-12-04T12:52:20.6584182Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c93bf90c971a4a89.json (deflated 46%) 2025-12-04T12:52:20.6585672Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-63e194c9a12582ed.json (deflated 45%) 2025-12-04T12:52:20.6587184Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e97ed0330e85ad8d.json (deflated 37%) 2025-12-04T12:52:20.6588698Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a7b490db4a9340e.json (deflated 46%) 2025-12-04T12:52:20.6590202Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4a0629607c3fa5fd.json (deflated 43%) 2025-12-04T12:52:20.6591817Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5dbb3e619be4e12b.json (deflated 45%) 2025-12-04T12:52:20.6593156Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ded592e1a5a858e0.json (deflated 45%) 2025-12-04T12:52:20.6594494Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bb87080fbf758d2.json (deflated 43%) 2025-12-04T12:52:20.6595835Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e5898fa75a7f3ef.json (deflated 37%) 2025-12-04T12:52:20.6597169Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b16de0301895bd76.json (deflated 37%) 2025-12-04T12:52:20.6598499Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2cc94a088d38104.json (deflated 37%) 2025-12-04T12:52:20.6599837Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c2b712ac61ede43d.json (deflated 37%) 2025-12-04T12:52:20.6601171Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc30b3da1a421347.json (deflated 44%) 2025-12-04T12:52:20.6602540Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9fe2cef34c40d5eb.json (deflated 36%) 2025-12-04T12:52:20.6603886Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c658fa95ccd2cf6a.json (deflated 36%) 2025-12-04T12:52:20.6605224Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-19a881494eafddb6.json (deflated 36%) 2025-12-04T12:52:20.6606565Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95816b9cbefdfdc1.json (deflated 37%) 2025-12-04T12:52:20.6607895Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2148875896839649.json (deflated 37%) 2025-12-04T12:52:20.6609228Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0127aecf34ab538.json (deflated 37%) 2025-12-04T12:52:20.6610616Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0d23338c1bb516d.json (deflated 43%) 2025-12-04T12:52:20.6611941Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8fc91952c534b6f9.json (deflated 36%) 2025-12-04T12:52:20.6613270Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-335bee712b3a4821.json (deflated 44%) 2025-12-04T12:52:20.6614950Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1537c74fba16ec9.json (deflated 38%) 2025-12-04T12:52:20.6616448Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f23e70d3f89edf0.json (deflated 43%) 2025-12-04T12:52:20.6617951Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148183526f50032e.json (deflated 43%) 2025-12-04T12:52:20.6619458Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8190f13f98ccf625.json (deflated 37%) 2025-12-04T12:52:20.6620971Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d1fbcdbdafd7eb07.json (deflated 43%) 2025-12-04T12:52:20.6622515Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8debf7e3f937383c.json (deflated 37%) 2025-12-04T12:52:20.6624017Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42a201482bdd75d2.json (deflated 46%) 2025-12-04T12:52:20.6625519Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9c4803d3c82177.json (deflated 45%) 2025-12-04T12:52:20.6627094Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70ca556779e39720.json (deflated 37%) 2025-12-04T12:52:20.6628419Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-690ee27b5383666e.json (deflated 38%) 2025-12-04T12:52:20.6629738Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b61a79c2ba69c2fc.json (deflated 44%) 2025-12-04T12:52:20.6631064Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23785637709aa6cb.json (deflated 44%) 2025-12-04T12:52:20.6632383Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c2a0306c654054e.json (deflated 37%) 2025-12-04T12:52:20.6633726Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c55b916a52817402.json (deflated 44%) 2025-12-04T12:52:20.6635045Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9018f81b0b2988f3.json (deflated 44%) 2025-12-04T12:52:20.6636359Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f11b61d4653bdb7.json (deflated 44%) 2025-12-04T12:52:20.6637687Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b135b7c4b7778b0.json (deflated 45%) 2025-12-04T12:52:20.6639014Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fce459c37c59c7e.json (deflated 37%) 2025-12-04T12:52:20.6640334Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-54617e16ec702700.json (deflated 44%) 2025-12-04T12:52:20.6641653Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b550c1faec52346.json (deflated 37%) 2025-12-04T12:52:20.6643023Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e030e348c4a08d77.json (deflated 42%) 2025-12-04T12:52:20.6644344Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f3321c7d9da11433.json (deflated 36%) 2025-12-04T12:52:20.6645669Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-84298f366cb09c62.json (deflated 36%) 2025-12-04T12:52:20.6646990Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6de764c5b54fbe28.json (deflated 37%) 2025-12-04T12:52:20.6648304Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a94cb9872a07c07.json (deflated 37%) 2025-12-04T12:52:20.6649628Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42095590a7a2fe72.json (deflated 37%) 2025-12-04T12:52:20.6650939Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a90373c5366804c7.json (deflated 36%) 2025-12-04T12:52:20.6652251Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-557572c546cf01dc.json (deflated 37%) 2025-12-04T12:52:20.6653650Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a95186e1de85626.json (deflated 38%) 2025-12-04T12:52:20.6655307Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96a6fe2e34c367be.json (deflated 37%) 2025-12-04T12:52:20.6656805Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ac9393c6a20ceccf.json (deflated 36%) 2025-12-04T12:52:20.6658306Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76abd5633e378237.json (deflated 37%) 2025-12-04T12:52:20.6659794Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a41c806c0c2532bc.json (deflated 45%) 2025-12-04T12:52:20.6661276Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0fea3b9528c40c04.json (deflated 45%) 2025-12-04T12:52:20.6662771Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87084c93c4099c6c.json (deflated 36%) 2025-12-04T12:52:20.6664290Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4521c7b0bbbf49e6.json (deflated 36%) 2025-12-04T12:52:20.6665886Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2ae149566d0f8ed8.json (deflated 35%) 2025-12-04T12:52:20.6667331Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ddafbffca1b4a2a2.json (deflated 37%) 2025-12-04T12:52:20.6668664Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-495169b72e129e3c.json (deflated 37%) 2025-12-04T12:52:20.6669997Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edaf36e92a9dab8d.json (deflated 38%) 2025-12-04T12:52:20.6671321Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1f0e24e21eda42b.json (deflated 36%) 2025-12-04T12:52:20.6672646Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b16af08a6bda590.json (deflated 57%) 2025-12-04T12:52:20.6674011Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d719be89195db636.json (deflated 38%) 2025-12-04T12:52:20.6675336Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53d64dcee99de118.json (deflated 37%) 2025-12-04T12:52:20.6676663Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea452bf92699d1d7.json (deflated 36%) 2025-12-04T12:52:20.6677987Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0d1b37af5093171.json (deflated 45%) 2025-12-04T12:52:20.6679661Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a15143db106f15f7.json (deflated 44%) 2025-12-04T12:52:20.6681168Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf396dfb3b4b2bb4.json (deflated 42%) 2025-12-04T12:52:20.6682672Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bb35081df35448e3.json (deflated 42%) 2025-12-04T12:52:20.6684162Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35b1275cb510efb5.json (deflated 37%) 2025-12-04T12:52:20.6685714Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e9367509f991dbb3.json (deflated 37%) 2025-12-04T12:52:20.6687207Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bd7d011a01059ce.json (deflated 57%) 2025-12-04T12:52:20.6688702Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3dc760c16caaf0c2.json (deflated 37%) 2025-12-04T12:52:20.6690207Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-77a48723971bcff5.json (deflated 37%) 2025-12-04T12:52:20.6691881Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db5717cebce92ef6.json (deflated 38%) 2025-12-04T12:52:20.6693202Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8eeacffd41be6e39.json (deflated 36%) 2025-12-04T12:52:20.6694875Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebed908b36444f09.json (deflated 37%) 2025-12-04T12:52:20.6696365Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ef96dc46d436334.json (deflated 37%) 2025-12-04T12:52:20.6697907Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ab7428bbadf34c3f.json (deflated 45%) 2025-12-04T12:52:20.6699395Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c50e50b95ac4e028.json (deflated 37%) 2025-12-04T12:52:20.6700885Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a38147e023273460.json (deflated 37%) 2025-12-04T12:52:20.6702373Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95d5c860934b2c1e.json (deflated 36%) 2025-12-04T12:52:20.6703866Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-25a3f0e33205a56a.json (deflated 37%) 2025-12-04T12:52:20.6705368Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eb201dfe29a7ff2.json (deflated 37%) 2025-12-04T12:52:20.6707017Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50b8421dbc1badac.json (deflated 43%) 2025-12-04T12:52:20.6708348Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11a93d269a83940f.json (deflated 37%) 2025-12-04T12:52:20.6709664Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-674b848f32484d1e.json (deflated 37%) 2025-12-04T12:52:20.6710984Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4121a01708986f89.json (deflated 37%) 2025-12-04T12:52:20.6712300Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-994ce7e474295a12.json (deflated 37%) 2025-12-04T12:52:20.6713622Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee469c21c041c7a4.json (deflated 37%) 2025-12-04T12:52:20.6714949Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3246d443991c34d3.json (deflated 37%) 2025-12-04T12:52:20.6716277Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d96eaef94da7418.json (deflated 43%) 2025-12-04T12:52:20.6717633Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3dacf547cad6a67d.json (deflated 37%) 2025-12-04T12:52:20.6719194Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3616b6f968436ee1.json (deflated 43%) 2025-12-04T12:52:20.6720595Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-622c1eaa2fb92c9d.json (deflated 37%) 2025-12-04T12:52:20.6722007Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-306192510a63b544.json (deflated 37%) 2025-12-04T12:52:20.6723409Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82095872e2babc1f.json (deflated 37%) 2025-12-04T12:52:20.6724814Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c84a9abadf4e6456.json (deflated 43%) 2025-12-04T12:52:20.6726226Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9893ace33f3830.json (deflated 45%) 2025-12-04T12:52:20.6727627Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-daefb987452748c4.json (deflated 37%) 2025-12-04T12:52:20.6729067Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3092859278d7bcb6.json (deflated 37%) 2025-12-04T12:52:20.6730460Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f10308d532e69d5.json (deflated 36%) 2025-12-04T12:52:20.6731849Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6ce015a4b2872362.json (deflated 36%) 2025-12-04T12:52:20.6733246Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-224e5072b13b6f72.json (deflated 36%) 2025-12-04T12:52:20.6734954Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-722d50f6db17be0a.json (deflated 37%) 2025-12-04T12:52:20.6736463Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3132a5f3cbdc8d40.json (deflated 37%) 2025-12-04T12:52:20.6737963Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c2952957ec5e4941.json (deflated 37%) 2025-12-04T12:52:20.6739514Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-088131fd2e1eb740.json (deflated 37%) 2025-12-04T12:52:20.6741016Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e35a0f5543ab7ba.json (deflated 36%) 2025-12-04T12:52:20.6742517Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cec746d21c82a41d.json (deflated 43%) 2025-12-04T12:52:20.6744021Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33aec040946b9fff.json (deflated 38%) 2025-12-04T12:52:20.6745513Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6714c1f1957114b9.json (deflated 36%) 2025-12-04T12:52:20.6747151Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5fb352bbe9e8c7.json (deflated 43%) 2025-12-04T12:52:20.6748630Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-308f137ee98c9d9b.json (deflated 46%) 2025-12-04T12:52:20.6749959Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5cf923d1d132c738.json (deflated 37%) 2025-12-04T12:52:20.6751308Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e9017a66c842280.json (deflated 42%) 2025-12-04T12:52:20.6752619Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52008c5414012f53.json (deflated 37%) 2025-12-04T12:52:20.6753950Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-707c869a600b32c4.json (deflated 37%) 2025-12-04T12:52:20.6755289Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9afcc52ba56cb7cd.json (deflated 37%) 2025-12-04T12:52:20.6756640Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c38c0e5b27dc65d.json (deflated 38%) 2025-12-04T12:52:20.6757976Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aaba21dbf0cfec28.json (deflated 44%) 2025-12-04T12:52:20.6759317Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-01e27bee5aad4037.json (deflated 44%) 2025-12-04T12:52:20.6760688Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d7295af7e1928.json (deflated 37%) 2025-12-04T12:52:20.6762026Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-520313bd404147de.json (deflated 44%) 2025-12-04T12:52:20.6763360Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a797af99437b20e.json (deflated 44%) 2025-12-04T12:52:20.6764687Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-37d6ef52a05b7d44.json (deflated 44%) 2025-12-04T12:52:20.6766029Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2315e451ee36d9f1.json (deflated 45%) 2025-12-04T12:52:20.6767371Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ecd0d14f056dd21.json (deflated 37%) 2025-12-04T12:52:20.6768713Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1beae1b9515a1c25.json (deflated 44%) 2025-12-04T12:52:20.6770084Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d4acbddf3c9607.json (deflated 37%) 2025-12-04T12:52:20.6771425Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c1a59c2fbd776179.json (deflated 42%) 2025-12-04T12:52:20.6772764Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-31b3b5953cdc94eb.json (deflated 36%) 2025-12-04T12:52:20.6774435Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0c79eda599deb55f.json (deflated 36%) 2025-12-04T12:52:20.6775945Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f54745fb5e614031.json (deflated 37%) 2025-12-04T12:52:20.6777447Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8182c37c5ee80d72.json (deflated 36%) 2025-12-04T12:52:20.6779140Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ab8cc14caa3fed1d.json (deflated 38%) 2025-12-04T12:52:20.6780652Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffbdb3959954615b.json (deflated 36%) 2025-12-04T12:52:20.6782186Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cef388e39996bffb.json (deflated 37%) 2025-12-04T12:52:20.6783686Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-969953b595bcabd5.json (deflated 38%) 2025-12-04T12:52:20.6785183Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f0371af0c88c192.json (deflated 37%) 2025-12-04T12:52:20.6786684Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-06587ef8e124e088.json (deflated 37%) 2025-12-04T12:52:20.6788182Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85f349019e7692ee.json (deflated 37%) 2025-12-04T12:52:20.6789700Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7da4ea5ccfcc936b.json (deflated 45%) 2025-12-04T12:52:20.6791251Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0583cccd09b9e0c9.json (deflated 45%) 2025-12-04T12:52:20.6792586Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-940ee9ff608e1e76.json (deflated 36%) 2025-12-04T12:52:20.6793949Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-048493e6df7d5c5f.json (deflated 36%) 2025-12-04T12:52:20.6795286Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f94739f99a45e5b.json (deflated 35%) 2025-12-04T12:52:20.6796620Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5d366f728b719f9d.json (deflated 38%) 2025-12-04T12:52:20.6797963Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bd04d92534f968b.json (deflated 37%) 2025-12-04T12:52:20.6799307Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b2b4bdf6b2f4c521.json (deflated 38%) 2025-12-04T12:52:20.6800650Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50ac6d99d2e163f0.json (deflated 36%) 2025-12-04T12:52:20.6802023Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-98ebf2381cda380a.json (deflated 57%) 2025-12-04T12:52:20.6803350Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-02a17fffec5d92e9.json (deflated 38%) 2025-12-04T12:52:20.6804692Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fa4be36d0f0fec.json (deflated 37%) 2025-12-04T12:52:20.6806037Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b20781e7e3b68cd0.json (deflated 36%) 2025-12-04T12:52:20.6807371Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-055ede09a7156885.json (deflated 45%) 2025-12-04T12:52:20.6808694Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2dc318e9e7472b33.json (deflated 44%) 2025-12-04T12:52:20.6810037Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6d75222f185b4d5.json (deflated 42%) 2025-12-04T12:52:20.6811376Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c6a79d484e935ea.json (deflated 42%) 2025-12-04T12:52:20.6812731Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-723363f561b2e381.json (deflated 37%) 2025-12-04T12:52:20.6814358Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfa70be2e3e2e968.json (deflated 37%) 2025-12-04T12:52:20.6815862Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c34de188341d059e.json (deflated 57%) 2025-12-04T12:52:20.6817375Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3f32df0d2792ec.json (deflated 37%) 2025-12-04T12:52:20.6818893Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae835e0e1dbfe3ec.json (deflated 38%) 2025-12-04T12:52:20.6820421Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41f83edcff84215a.json (deflated 38%) 2025-12-04T12:52:20.6821934Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a860decd20224034.json (deflated 36%) 2025-12-04T12:52:20.6823437Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3963d6f58e4804f4.json (deflated 37%) 2025-12-04T12:52:20.6824973Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b2893112b7e4a0fd.json (deflated 37%) 2025-12-04T12:52:20.6826635Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ae1d3298b84e247.json (deflated 45%) 2025-12-04T12:52:20.6827970Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce25bc0bb1b2125d.json (deflated 37%) 2025-12-04T12:52:20.6829301Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b733f78747bdf8ab.json (deflated 37%) 2025-12-04T12:52:20.6830642Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f13a1853be14fff2.json (deflated 36%) 2025-12-04T12:52:20.6831982Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e0c8a225205b2b1b.json (deflated 37%) 2025-12-04T12:52:20.6833318Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e42484012944fc8.json (deflated 37%) 2025-12-04T12:52:20.6834687Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76186faceb0b0d55.json (deflated 43%) 2025-12-04T12:52:20.6836022Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a9b9276398f16f1.json (deflated 37%) 2025-12-04T12:52:20.6837350Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c78e67ea2df6209d.json (deflated 37%) 2025-12-04T12:52:20.6838686Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7358e80ff8c5c8ba.json (deflated 37%) 2025-12-04T12:52:20.6840023Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b10a6c1a1360dd8.json (deflated 37%) 2025-12-04T12:52:20.6841364Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cf0679388a3449b.json (deflated 37%) 2025-12-04T12:52:20.6842694Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-26580fbd1387c903.json (deflated 37%) 2025-12-04T12:52:20.6844033Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5db097ce06fd3ba.json (deflated 43%) 2025-12-04T12:52:20.6845393Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5f966f935cdcc510.json (deflated 37%) 2025-12-04T12:52:20.6846726Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4a30815173ad737.json (deflated 43%) 2025-12-04T12:52:20.6848063Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41db9b895b480d7b.json (deflated 37%) 2025-12-04T12:52:20.6849401Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78a881039e14b5b2.json (deflated 37%) 2025-12-04T12:52:20.6850735Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd62c820cf2113b8.json (deflated 37%) 2025-12-04T12:52:20.6852063Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c61700ab79aac30.json (deflated 43%) 2025-12-04T12:52:20.6853473Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae9f946365ef7f7.json (deflated 45%) 2025-12-04T12:52:20.6855178Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d551f94bc57d8aff.json (deflated 37%) 2025-12-04T12:52:20.6856692Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dcae987a61c4873a.json (deflated 37%) 2025-12-04T12:52:20.6858197Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca6886c3086a09a1.json (deflated 36%) 2025-12-04T12:52:20.6859697Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aa0548b0a0f16fd5.json (deflated 37%) 2025-12-04T12:52:20.6861212Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b837bfe0a5661087.json (deflated 36%) 2025-12-04T12:52:20.6862716Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42e3e45e9107b083.json (deflated 37%) 2025-12-04T12:52:20.6864221Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2e64285e29d874e0.json (deflated 37%) 2025-12-04T12:52:20.6865766Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4a385e35494820f4.json (deflated 37%) 2025-12-04T12:52:20.6867226Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9cc203997a753f09.json (deflated 37%) 2025-12-04T12:52:20.6868560Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0913cee8f07dc0af.json (deflated 36%) 2025-12-04T12:52:20.6869895Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5fb0cf096433beb.json (deflated 43%) 2025-12-04T12:52:20.6871244Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4d28c8a1a46915c.json (deflated 37%) 2025-12-04T12:52:20.6872568Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b653d29a66c2470a.json (deflated 37%) 2025-12-04T12:52:20.6873909Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e7285ab6b5c9527e.json (deflated 44%) 2025-12-04T12:52:20.6875244Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a54941f41b8cfe8.json (deflated 46%) 2025-12-04T12:52:20.6876607Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91e59cc2a798549a.json (deflated 37%) 2025-12-04T12:52:20.6877938Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e72c23544464af54.json (deflated 42%) 2025-12-04T12:52:20.6879611Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-45f80f9c137d75a5.json (deflated 37%) 2025-12-04T12:52:20.6880306Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d93447ba16fc454.json (deflated 37%) 2025-12-04T12:52:20.6880992Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24191bbc1a349edf.json (deflated 37%) 2025-12-04T12:52:20.6902000Z ##[group]Run # Remove any previous test reports if they exist 2025-12-04T12:52:20.6902201Z # Remove any previous test reports if they exist 2025-12-04T12:52:20.6902331Z rm -f test-reports-*.zip 2025-12-04T12:52:20.6902640Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-12-04T12:52:20.6908588Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:20.6908783Z env: 2025-12-04T12:52:20.6908891Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:20.6908987Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:20.6909170Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:20.6909497Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:20.6909810Z FILE_SUFFIX: test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T12:52:20.6909904Z ##[endgroup] 2025-12-04T12:52:20.7095354Z adding: test/test-reports/python-pytest/distributed.test_dynamo_distributed/distributed.test_dynamo_distributed-7d68e185dc40b8e4.xml (deflated 89%) 2025-12-04T12:52:20.7096022Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-550d70029afd2dcd.xml (deflated 77%) 2025-12-04T12:52:20.7096670Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-20b4ace7c31b01bc.xml (deflated 77%) 2025-12-04T12:52:20.7097316Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-fca413b3f7307fd5.xml (deflated 77%) 2025-12-04T12:52:20.7098403Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-ee4bf9b90915483d.xml (deflated 77%) 2025-12-04T12:52:20.7099052Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5878d33d525e22d1.xml (deflated 77%) 2025-12-04T12:52:20.7099890Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-71ced988c25116a1.xml (deflated 77%) 2025-12-04T12:52:20.7100770Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-00e807acf0912dba.xml (deflated 77%) 2025-12-04T12:52:20.7101702Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-b604ebea332b8d41.xml (deflated 77%) 2025-12-04T12:52:20.7102574Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-320024c0c6bb40b5.xml (deflated 77%) 2025-12-04T12:52:20.7103217Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-f868d67f33e24985.xml (deflated 28%) 2025-12-04T12:52:20.7104246Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-7c59f12ab3dc26b8.xml (deflated 77%) 2025-12-04T12:52:20.7105028Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-fd5feab7f24ea67e.xml (deflated 77%) 2025-12-04T12:52:20.7106058Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-42ba77c7c8182ea3.xml (deflated 77%) 2025-12-04T12:52:20.7106799Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_multiple_wrapping/distributed.fsdp.test_fsdp_multiple_wrapping-c4d1f6f933180ae5.xml (deflated 28%) 2025-12-04T12:52:20.7107470Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e7575131d09c7d5b.xml (deflated 77%) 2025-12-04T12:52:20.7108110Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fea5835408d37079.xml (deflated 77%) 2025-12-04T12:52:20.7108795Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-3b87fcc1c5f1359f.xml (deflated 77%) 2025-12-04T12:52:20.7109608Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2c3852776dc4d6af.xml (deflated 77%) 2025-12-04T12:52:20.7110465Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9e3a89401a26a2c7.xml (deflated 77%) 2025-12-04T12:52:20.7111340Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-dadce7936b268df6.xml (deflated 77%) 2025-12-04T12:52:20.7112176Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-fc03e360104e794a.xml (deflated 77%) 2025-12-04T12:52:20.7113013Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-2239b5ce820a8e80.xml (deflated 77%) 2025-12-04T12:52:20.7113866Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-750660f185e24025.xml (deflated 77%) 2025-12-04T12:52:20.7114740Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-8a47980366c1ac84.xml (deflated 77%) 2025-12-04T12:52:20.7115611Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-ca9abf7ee24c038e.xml (deflated 77%) 2025-12-04T12:52:20.7116455Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-7eb1dc39773c41c4.xml (deflated 77%) 2025-12-04T12:52:20.7117176Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-b3cb6eb5be1f3e0c.xml (deflated 28%) 2025-12-04T12:52:20.7117928Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-dcb6c7b6743de89e.xml (deflated 78%) 2025-12-04T12:52:20.7118679Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c0871d667bd4df8d.xml (deflated 78%) 2025-12-04T12:52:20.7119816Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b1a99f4c33297699.xml (deflated 87%) 2025-12-04T12:52:20.7120734Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7332aead750b9bce.xml (deflated 78%) 2025-12-04T12:52:20.7121643Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c7d658062419b597.xml (deflated 78%) 2025-12-04T12:52:20.7122532Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-348cc3a828a50222.xml (deflated 78%) 2025-12-04T12:52:20.7123425Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bb573131fa19ab29.xml (deflated 78%) 2025-12-04T12:52:20.7124545Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-16546bb6943a3c11.xml (deflated 87%) 2025-12-04T12:52:20.7125451Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-cdd2e74ccc0956b9.xml (deflated 78%) 2025-12-04T12:52:20.7126558Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9650dbe5a6e76fd8.xml (deflated 87%) 2025-12-04T12:52:20.7127468Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-53bea78db525054e.xml (deflated 78%) 2025-12-04T12:52:20.7128366Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-470dd7f8801a129e.xml (deflated 78%) 2025-12-04T12:52:20.7129265Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-864712a0594b6ca2.xml (deflated 78%) 2025-12-04T12:52:20.7130198Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c88c69879eff0a17.xml (deflated 78%) 2025-12-04T12:52:20.7131070Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ecd21e7500304b9f.xml (deflated 78%) 2025-12-04T12:52:20.7131956Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-e7cfa143d1c9be09.xml (deflated 78%) 2025-12-04T12:52:20.7132837Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-976f30802ad214bb.xml (deflated 78%) 2025-12-04T12:52:20.7134234Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d8b05be053af669.xml (deflated 87%) 2025-12-04T12:52:20.7135566Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ceb5badd22358e55.xml (deflated 90%) 2025-12-04T12:52:20.7136574Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-f5dcf7c66579f3c2.xml (deflated 78%) 2025-12-04T12:52:20.7137448Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-21e2c8920cf3865d.xml (deflated 78%) 2025-12-04T12:52:20.7138381Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-3dd1dab0649736e8.xml (deflated 78%) 2025-12-04T12:52:20.7139290Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-255818cdbe5fbd05.xml (deflated 78%) 2025-12-04T12:52:20.7140457Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2340b7a625d10704.xml (deflated 87%) 2025-12-04T12:52:20.7141610Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-6880e02fcbe22f17.xml (deflated 87%) 2025-12-04T12:52:20.7142759Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a497c1942163e16f.xml (deflated 87%) 2025-12-04T12:52:20.7143749Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a4ee3bf5f7a9a01f.xml (deflated 78%) 2025-12-04T12:52:20.7144689Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2d9ebf91db9daa02.xml (deflated 78%) 2025-12-04T12:52:20.7145607Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-04afe2c287023adc.xml (deflated 78%) 2025-12-04T12:52:20.7146624Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-80cc64b9f2eb85b8.xml (deflated 78%) 2025-12-04T12:52:20.7147740Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-c38510c20e07f456.xml (deflated 87%) 2025-12-04T12:52:20.7148837Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-bec03360e514672a.xml (deflated 87%) 2025-12-04T12:52:20.7149747Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7929a6c5753a5bf7.xml (deflated 78%) 2025-12-04T12:52:20.7150677Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-b0deb68b75574955.xml (deflated 78%) 2025-12-04T12:52:20.7151591Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-eda6c23a06d3c574.xml (deflated 78%) 2025-12-04T12:52:20.7152472Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-2e17c39fb483ae46.xml (deflated 78%) 2025-12-04T12:52:20.7153573Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-9ad7d4c20da7406b.xml (deflated 86%) 2025-12-04T12:52:20.7154462Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-5b391720a035fce0.xml (deflated 78%) 2025-12-04T12:52:20.7155353Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7e0ad2dc0411fa40.xml (deflated 78%) 2025-12-04T12:52:20.7156280Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-ab2eda46e6c1c6d0.xml (deflated 78%) 2025-12-04T12:52:20.7157196Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-7d4e43b394d06af0.xml (deflated 78%) 2025-12-04T12:52:20.7158038Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-d04e17113c0af8ba.xml (deflated 78%) 2025-12-04T12:52:20.7158909Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-a3b3387bd6019536.xml (deflated 77%) 2025-12-04T12:52:20.7159811Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-06677f872919b29b.xml (deflated 77%) 2025-12-04T12:52:20.7160684Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-da2c1a3b7d1cdaf6.xml (deflated 77%) 2025-12-04T12:52:20.7161425Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_dtensor_state_dict/distributed.fsdp.test_fsdp_dtensor_state_dict-467d89e082f97fc4.xml (deflated 28%) 2025-12-04T12:52:20.7162028Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4e48aa8d10589348.xml (deflated 77%) 2025-12-04T12:52:20.7162741Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3193e57821c2ebca.xml (deflated 77%) 2025-12-04T12:52:20.7163748Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9a7469c5b46925c2.xml (deflated 86%) 2025-12-04T12:52:20.7164621Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-227ae9e59104394c.xml (deflated 78%) 2025-12-04T12:52:20.7165525Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-38750597d70d3b79.xml (deflated 78%) 2025-12-04T12:52:20.7166605Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8d9e40030a96f20.xml (deflated 86%) 2025-12-04T12:52:20.7167960Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-257466dde9fb107b.xml (deflated 90%) 2025-12-04T12:52:20.7168750Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-94f5dd2e01869af2.xml (deflated 78%) 2025-12-04T12:52:20.7169659Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-42c522b0340c97ac.xml (deflated 78%) 2025-12-04T12:52:20.7170529Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-32205b0cc860e51d.xml (deflated 78%) 2025-12-04T12:52:20.7171401Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e08ad6962badbec0.xml (deflated 78%) 2025-12-04T12:52:20.7172270Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2ea67fcde569130f.xml (deflated 78%) 2025-12-04T12:52:20.7173141Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2525c9886ebe84d6.xml (deflated 77%) 2025-12-04T12:52:20.7174269Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-265cb7987b98bd4a.xml (deflated 77%) 2025-12-04T12:52:20.7175158Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0dad56460685f27c.xml (deflated 77%) 2025-12-04T12:52:20.7176306Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0a57eff14e5fabd3.xml (deflated 86%) 2025-12-04T12:52:20.7177191Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f9981ec75d7ffd49.xml (deflated 77%) 2025-12-04T12:52:20.7178397Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9f763e8043031072.xml (deflated 86%) 2025-12-04T12:52:20.7179424Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-02dc996acd3ff226.xml (deflated 77%) 2025-12-04T12:52:20.7180340Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a9525f3a3720890d.xml (deflated 77%) 2025-12-04T12:52:20.7181234Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0add77cc4faf0004.xml (deflated 77%) 2025-12-04T12:52:20.7182124Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb044786f28290de.xml (deflated 77%) 2025-12-04T12:52:20.7183049Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3368174cbc9b5a.xml (deflated 77%) 2025-12-04T12:52:20.7184147Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48241fc1d70a928.xml (deflated 86%) 2025-12-04T12:52:20.7185032Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4808bec29186d3a1.xml (deflated 77%) 2025-12-04T12:52:20.7185951Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0fe450ea21eea83e.xml (deflated 77%) 2025-12-04T12:52:20.7186845Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4119ccbda03fb8bd.xml (deflated 77%) 2025-12-04T12:52:20.7187728Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77506912df9607dd.xml (deflated 77%) 2025-12-04T12:52:20.7188866Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-71967038f6397bcb.xml (deflated 86%) 2025-12-04T12:52:20.7189961Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7aea602ded691711.xml (deflated 86%) 2025-12-04T12:52:20.7191265Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-588db22c786ffc0c.xml (deflated 86%) 2025-12-04T12:52:20.7192261Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d2c93dc13a89050c.xml (deflated 86%) 2025-12-04T12:52:20.7193145Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e269a47641789945.xml (deflated 77%) 2025-12-04T12:52:20.7194004Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d0e2108c889b6f40.xml (deflated 77%) 2025-12-04T12:52:20.7195068Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d7cc16231ece4156.xml (deflated 86%) 2025-12-04T12:52:20.7195951Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-01da52837cd28026.xml (deflated 77%) 2025-12-04T12:52:20.7196781Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa4201c32172891c.xml (deflated 77%) 2025-12-04T12:52:20.7198079Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4da7b120579aed6b.xml (deflated 90%) 2025-12-04T12:52:20.7198861Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8c1fa5c204db7919.xml (deflated 77%) 2025-12-04T12:52:20.7199728Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-79d8c8140e8d4a45.xml (deflated 77%) 2025-12-04T12:52:20.7200566Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8cddc87d4e2da4.xml (deflated 77%) 2025-12-04T12:52:20.7201487Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-312ddbdab57572f7.xml (deflated 77%) 2025-12-04T12:52:20.7202468Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-fb1563559edf316c.xml (deflated 86%) 2025-12-04T12:52:20.7203761Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-63a7a52cd3aa8936.xml (deflated 90%) 2025-12-04T12:52:20.7204546Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a6324f00d63e140d.xml (deflated 77%) 2025-12-04T12:52:20.7205387Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cab4f0cffa47b1f.xml (deflated 77%) 2025-12-04T12:52:20.7206229Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a997c4f2b1c679bc.xml (deflated 77%) 2025-12-04T12:52:20.7207254Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb42278badc3bd05.xml (deflated 86%) 2025-12-04T12:52:20.7208293Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e66930a4930311d.xml (deflated 86%) 2025-12-04T12:52:20.7209348Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-caca850dfa53af0d.xml (deflated 86%) 2025-12-04T12:52:20.7210202Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5ab6f8f72e0857a0.xml (deflated 77%) 2025-12-04T12:52:20.7211064Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82a2bf200d1dcaa2.xml (deflated 77%) 2025-12-04T12:52:20.7211934Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a014b9bd1b37d049.xml (deflated 77%) 2025-12-04T12:52:20.7212780Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-95f17e90ca4b9755.xml (deflated 77%) 2025-12-04T12:52:20.7213864Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7ff8c73ee302c339.xml (deflated 77%) 2025-12-04T12:52:20.7214782Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cb6a3efe573e986.xml (deflated 77%) 2025-12-04T12:52:20.7216152Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-44c122860a547cf4.xml (deflated 90%) 2025-12-04T12:52:20.7217015Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-668210e8e09c8dd9.xml (deflated 77%) 2025-12-04T12:52:20.7218360Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9448ec7a0a61b5a6.xml (deflated 90%) 2025-12-04T12:52:20.7219262Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8d6fd75ad2c1f260.xml (deflated 77%) 2025-12-04T12:52:20.7220355Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa2eb835ecdd4375.xml (deflated 86%) 2025-12-04T12:52:20.7221264Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e3a192bec2a8308.xml (deflated 77%) 2025-12-04T12:52:20.7222683Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c427df5212a82823.xml (deflated 90%) 2025-12-04T12:52:20.7223541Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a459d6ece2e0d396.xml (deflated 77%) 2025-12-04T12:52:20.7224468Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82bd08cbda0b3168.xml (deflated 77%) 2025-12-04T12:52:20.7225421Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cb7d608d20fa1845.xml (deflated 77%) 2025-12-04T12:52:20.7226666Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3bf4168a6952dca5.xml (deflated 86%) 2025-12-04T12:52:20.7227497Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-69250cff44e166fa.xml (deflated 77%) 2025-12-04T12:52:20.7228379Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-311a7d97c78eb59e.xml (deflated 77%) 2025-12-04T12:52:20.7229420Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5f502e67619c39f3.xml (deflated 86%) 2025-12-04T12:52:20.7230297Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8eee5abee9febb4.xml (deflated 77%) 2025-12-04T12:52:20.7231129Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-acff5684d72dd2d3.xml (deflated 77%) 2025-12-04T12:52:20.7231972Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-72a03384c8cb338e.xml (deflated 77%) 2025-12-04T12:52:20.7232860Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-efc6da476f35386f.xml (deflated 77%) 2025-12-04T12:52:20.7233689Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ecdb0e90ac1c2bc1.xml (deflated 77%) 2025-12-04T12:52:20.7234727Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9524e9df873b8be0.xml (deflated 86%) 2025-12-04T12:52:20.7235776Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ad9e0258dc223929.xml (deflated 86%) 2025-12-04T12:52:20.7236820Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-17a67113f8be5d53.xml (deflated 86%) 2025-12-04T12:52:20.7237647Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c0681c30ea8c1a74.xml (deflated 77%) 2025-12-04T12:52:20.7238525Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-27fc2bea2cad5f2f.xml (deflated 77%) 2025-12-04T12:52:20.7239338Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-834db467a2bb808c.xml (deflated 77%) 2025-12-04T12:52:20.7240206Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f48f6c290350ef90.xml (deflated 78%) 2025-12-04T12:52:20.7241259Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1ee2f71fb8de6413.xml (deflated 87%) 2025-12-04T12:52:20.7242144Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7f8857b703d650c5.xml (deflated 78%) 2025-12-04T12:52:20.7243205Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19e11b35947b0a14.xml (deflated 86%) 2025-12-04T12:52:20.7244074Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b5d6f54eb5c8ad3.xml (deflated 78%) 2025-12-04T12:52:20.7245183Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b2eb1b61ddd90ac8.xml (deflated 86%) 2025-12-04T12:52:20.7246025Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e6810d3a1c38013d.xml (deflated 78%) 2025-12-04T12:52:20.7246914Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-53dff883b0afb17e.xml (deflated 78%) 2025-12-04T12:52:20.7247799Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c92b48ae22d6a39.xml (deflated 78%) 2025-12-04T12:52:20.7248644Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bb06b260fb006313.xml (deflated 77%) 2025-12-04T12:52:20.7249471Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29ffb7b96244526a.xml (deflated 77%) 2025-12-04T12:52:20.7250493Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-64a83fa5a2cd03db.xml (deflated 86%) 2025-12-04T12:52:20.7251556Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1d1edf2996f09e22.xml (deflated 86%) 2025-12-04T12:52:20.7252582Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d210f9114da1dd7.xml (deflated 86%) 2025-12-04T12:52:20.7253513Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ffec29f281535337.xml (deflated 77%) 2025-12-04T12:52:20.7254608Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-878c5ded0afd20c5.xml (deflated 77%) 2025-12-04T12:52:20.7255499Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c1c7d4a809089a1.xml (deflated 77%) 2025-12-04T12:52:20.7256385Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-78a9e7f81c58cb5d.xml (deflated 77%) 2025-12-04T12:52:20.7257009Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6f57499824dd1125.xml (deflated 28%) 2025-12-04T12:52:20.7257853Z adding: test/test-reports/python-pytest/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks-671b5e8f8d643201.xml (deflated 77%) 2025-12-04T12:52:20.7258507Z adding: test/test-reports/python-pytest/distributed.tensor.test_op_schema/distributed.tensor.test_op_schema-bb5a16ac0960925a.xml (deflated 50%) 2025-12-04T12:52:20.7259235Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_nested_dict/distributed.checkpoint.test_nested_dict-81f92522f1154383.xml (deflated 51%) 2025-12-04T12:52:20.7260128Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_consolidate_hf_safetensors/distributed.checkpoint.test_consolidate_hf_safetensors-d914312b5a4148e2.xml (deflated 77%) 2025-12-04T12:52:20.7260991Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_barriers/distributed.checkpoint._experimental.test_barriers-d8f5a49da0f436d9.xml (deflated 52%) 2025-12-04T12:52:20.7261944Z adding: test/test-reports/python-pytest/distributed.pipelining.test_transformer/distributed.pipelining.test_transformer-e70c997724b03d0e.xml (deflated 40%) 2025-12-04T12:52:20.7262715Z adding: test/test-reports/python-pytest/distributed.flight_recorder.test_fr_analysis/distributed.flight_recorder.test_fr_analysis-aba4e9f61260e449.xml (deflated 73%) 2025-12-04T12:52:20.7263410Z adding: test/test-reports/python-pytest/distributed._composable.test_contract/distributed._composable.test_contract-43d2ccf9f44c35a5.xml (deflated 61%) 2025-12-04T12:52:20.7264158Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_dedup_tensors/distributed.checkpoint.test_dedup_tensors-98db0ae6ec3ef072.xml (deflated 38%) 2025-12-04T12:52:20.7264822Z adding: test/test-reports/python-pytest/distributed.pipelining.test_pipe/distributed.pipelining.test_pipe-b65ad592f97073ad.xml (deflated 67%) 2025-12-04T12:52:20.7265523Z adding: test/test-reports/python-pytest/distributed.pipelining.test_backward/distributed.pipelining.test_backward-4f9205b7617a9aaf.xml (deflated 72%) 2025-12-04T12:52:20.7266305Z adding: test/test-reports/python-pytest/distributed.test_nvshmem_triton/distributed.test_nvshmem_triton-2d1da825c1a177a7.xml (deflated 95%) 2025-12-04T12:52:20.7266918Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor/distributed.tensor.test_dtensor-780171e06b9d081c.xml (deflated 89%) 2025-12-04T12:52:20.7267459Z adding: test/test-reports/python-pytest/distributed.test_p2p_ipc/distributed.test_p2p_ipc-22d7fd7242fa3e1d.xml (deflated 46%) 2025-12-04T12:52:20.7268116Z adding: test/test-reports/python-pytest/distributed.tensor.test_common_rules/distributed.tensor.test_common_rules-f2e475ef5a58885a.xml (deflated 80%) 2025-12-04T12:52:20.7268875Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_hf_safetensor_e2e/distributed.checkpoint.test_hf_safetensor_e2e-49bc702c32e1be14.xml (deflated 80%) 2025-12-04T12:52:20.7269502Z adding: test/test-reports/python-pytest/distributed.tensor.test_dynamic/distributed.tensor.test_dynamic-58a11920d980fced.xml (deflated 68%) 2025-12-04T12:52:20.7270198Z adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_fsdp_ep/distributed.checkpoint.e2e.test_fsdp_ep-90e84c8c71d0d519.xml (deflated 44%) 2025-12-04T12:52:20.7270905Z adding: test/test-reports/python-pytest/distributed.pipelining.test_unflatten/distributed.pipelining.test_unflatten-9c61fbce9d8da54e.xml (deflated 41%) 2025-12-04T12:52:20.7271612Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_testbase/distributed.tensor.test_dtensor_testbase-90ed30d8fe3a2fcc.xml (deflated 40%) 2025-12-04T12:52:20.7272329Z adding: test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-1fba6503450910ca.xml (deflated 84%) 2025-12-04T12:52:20.7272867Z adding: test/test-reports/python-pytest/distributed.test_nvshmem/distributed.test_nvshmem-c601d1a92c913214.xml (deflated 95%) 2025-12-04T12:52:20.7273510Z adding: test/test-reports/python-pytest/distributed.tensor.test_attention/distributed.tensor.test_attention-f7e42a024369f922.xml (deflated 82%) 2025-12-04T12:52:20.7274216Z adding: test/test-reports/python-pytest/distributed.tensor.test_convolution_ops/distributed.tensor.test_convolution_ops-37d4b4387fe7c9dd.xml (deflated 87%) 2025-12-04T12:52:20.7274947Z adding: test/test-reports/python-pytest/distributed.checkpoint.fsdp.test_fsdp_dsd/distributed.checkpoint.fsdp.test_fsdp_dsd-1bb4a1e7d3cbef72.xml (deflated 75%) 2025-12-04T12:52:20.7275665Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_save_load_api/distributed.checkpoint.test_save_load_api-607495a1278fa4ba.xml (deflated 49%) 2025-12-04T12:52:20.7276438Z adding: test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode_features/distributed.tensor.debug.test_comm_mode_features-f7a4a3df89327d4b.xml (deflated 67%) 2025-12-04T12:52:20.7277133Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_ops/distributed.tensor.test_dtensor_ops-ea81859469c32dce.xml (deflated 28%) 2025-12-04T12:52:20.7277653Z adding: test/test-reports/python-pytest/distributed.test_debug/distributed.test_debug-be889cccd8acb9a9.xml (deflated 40%) 2025-12-04T12:52:20.7278338Z adding: test/test-reports/python-pytest/distributed.test_overlap_bucketing_unit/distributed.test_overlap_bucketing_unit-ca2c159a43fd5a2e.xml (deflated 79%) 2025-12-04T12:52:20.7279604Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_writer/distributed.checkpoint._experimental.test_checkpoint_writer-b51ac79d06c0ddb7.xml (deflated 80%) 2025-12-04T12:52:20.7280512Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpointer/distributed.checkpoint._experimental.test_checkpointer-181aea9ab4e75ef7.xml (deflated 71%) 2025-12-04T12:52:20.7281130Z adding: test/test-reports/python-pytest/distributed.tensor.test_init/distributed.tensor.test_init-b970b50400f392fc.xml (deflated 79%) 2025-12-04T12:52:20.7281839Z adding: test/test-reports/python-pytest/distributed._composable.test_checkpoint/distributed._composable.test_checkpoint-a1aa396939174424.xml (deflated 72%) 2025-12-04T12:52:20.7282659Z adding: test/test-reports/python-pytest/distributed._tools.test_fsdp2_mem_tracker/distributed._tools.test_fsdp2_mem_tracker-3cf763bb11a5de99.xml (deflated 59%) 2025-12-04T12:52:20.7283526Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate_mixed_precision/distributed._composable.test_replicate_mixed_precision-36b9cc9e417e77fd.xml (deflated 79%) 2025-12-04T12:52:20.7284284Z adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_fine_tuning/distributed.checkpoint.e2e.test_fine_tuning-c68ce24632e972fe.xml (deflated 38%) 2025-12-04T12:52:20.7284957Z adding: test/test-reports/python-pytest/distributed.tensor.test_matrix_ops/distributed.tensor.test_matrix_ops-225fb5a0fab4f212.xml (deflated 88%) 2025-12-04T12:52:20.7285632Z adding: test/test-reports/python-pytest/distributed.tensor.test_optimizers/distributed.tensor.test_optimizers-208a364b29da2421.xml (deflated 82%) 2025-12-04T12:52:20.7286290Z adding: test/test-reports/python-pytest/distributed.test_symmetric_memory/distributed.test_symmetric_memory-e6666f579f07be4f.xml (deflated 96%) 2025-12-04T12:52:20.7287007Z adding: test/test-reports/python-pytest/distributed._tools.test_runtime_estimator/distributed._tools.test_runtime_estimator-9422bc676dd3a656.xml (deflated 59%) 2025-12-04T12:52:20.7287866Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate_with_compiler/distributed._composable.test_replicate_with_compiler-845dc753fbff3b86.xml (deflated 77%) 2025-12-04T12:52:20.7288761Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_autograd/distributed._composable.fsdp.test_fully_shard_autograd-6a8bc02b72927b79.xml (deflated 68%) 2025-12-04T12:52:20.7289735Z adding: test/test-reports/python-pytest/distributed._composable.test_composability.test_2d_composability/distributed._composable.test_composability.test_2d_composability-218cfa8a31a3ba84.xml (deflated 82%) 2025-12-04T12:52:20.7290448Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_optim_state/distributed.fsdp.test_fsdp_optim_state-a3d7bfb88e0bb04b.xml (deflated 93%) 2025-12-04T12:52:20.7291144Z adding: test/test-reports/python-pytest/distributed.test_c10d_logger/distributed.test_c10d_logger-087942ef032695a4.xml (deflated 50%) 2025-12-04T12:52:20.7291930Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate_training/distributed._composable.test_replicate_training-2cbeb0e1e9d2c847.xml (deflated 79%) 2025-12-04T12:52:20.7292554Z adding: test/test-reports/python-pytest/distributed.rpc.test_share_memory/distributed.rpc.test_share_memory-7afea101a44bab53.xml (deflated 48%) 2025-12-04T12:52:20.7293207Z adding: test/test-reports/python-pytest/distributed.tensor.test_op_strategy/distributed.tensor.test_op_strategy-6fbbc916638ee901.xml (deflated 85%) 2025-12-04T12:52:20.7294109Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_grad_acc/distributed.fsdp.test_fsdp_grad_acc-a75842029d7b9dcc.xml (deflated 75%) 2025-12-04T12:52:20.7294899Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_stager/distributed.checkpoint.test_state_dict_stager-c8decf93ed909c05.xml (deflated 77%) 2025-12-04T12:52:20.7295658Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_freezing_weights/distributed.fsdp.test_fsdp_freezing_weights-c610b4e9e056a60a.xml (deflated 95%) 2025-12-04T12:52:20.7296468Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_init/distributed._composable.fsdp.test_fully_shard_init-94adf46d5612666a.xml (deflated 87%) 2025-12-04T12:52:20.7297193Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_flatten_params/distributed.fsdp.test_fsdp_flatten_params-1722984b0a3e650a.xml (deflated 83%) 2025-12-04T12:52:20.7297838Z adding: test/test-reports/python-pytest/distributed.test_composability/distributed.test_composability-4dcd79eb001aa4cf.xml (deflated 84%) 2025-12-04T12:52:20.7298570Z adding: test/test-reports/python-pytest/distributed.test_multi_threaded_pg/distributed.test_multi_threaded_pg-cb00591a34ee6ad2.xml (deflated 82%) 2025-12-04T12:52:20.7299461Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_extensions/distributed._composable.fsdp.test_fully_shard_extensions-e4e54db12d00fc4b.xml (deflated 72%) 2025-12-04T12:52:20.7300040Z adding: test/test-reports/python-pytest/distributed.fsdp.test_wrap/distributed.fsdp.test_wrap-8d38fac6f1a86713.xml (deflated 83%) 2025-12-04T12:52:20.7300743Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_hybrid_shard/distributed.fsdp.test_fsdp_hybrid_shard-b37436896c0f0a07.xml (deflated 70%) 2025-12-04T12:52:20.7301602Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_training/distributed._composable.fsdp.test_fully_shard_training-6c9b30f951a7219e.xml (deflated 81%) 2025-12-04T12:52:20.7302359Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-d961ab4b1fb94450.xml (deflated 56%) 2025-12-04T12:52:20.7303092Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2749604adbdd83d7.xml (deflated 34%) 2025-12-04T12:52:20.7303835Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a08d4078ed5ab3a4.xml (deflated 36%) 2025-12-04T12:52:20.7304598Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f043e483cea1f140.xml (deflated 35%) 2025-12-04T12:52:20.7305337Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-562cdf6dc98614a4.xml (deflated 35%) 2025-12-04T12:52:20.7306188Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7a355d7d848a3783.xml (deflated 35%) 2025-12-04T12:52:20.7306906Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e938b1bd7ee21e63.xml (deflated 35%) 2025-12-04T12:52:20.7307620Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7b620378c03b2b8c.xml (deflated 36%) 2025-12-04T12:52:20.7308331Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-51e337155d132168.xml (deflated 38%) 2025-12-04T12:52:20.7309045Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-858bb2cae53302d6.xml (deflated 38%) 2025-12-04T12:52:20.7309789Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f578d6bffa26b363.xml (deflated 38%) 2025-12-04T12:52:20.7310498Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-3d9a69729c5194dd.xml (deflated 37%) 2025-12-04T12:52:20.7311214Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-907751cfd0f9a14d.xml (deflated 37%) 2025-12-04T12:52:20.7311926Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-9a92a0723441ebbb.xml (deflated 36%) 2025-12-04T12:52:20.7312640Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-000e8f0311241e72.xml (deflated 37%) 2025-12-04T12:52:20.7313352Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-18ef89366410fd29.xml (deflated 37%) 2025-12-04T12:52:20.7314127Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ae7171f8ebe30954.xml (deflated 36%) 2025-12-04T12:52:20.7314836Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-03f5357c50df6990.xml (deflated 36%) 2025-12-04T12:52:20.7315559Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-df0148e80116049a.xml (deflated 36%) 2025-12-04T12:52:20.7316278Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-f8722be29ef1f355.xml (deflated 36%) 2025-12-04T12:52:20.7316982Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-5b2d107716225579.xml (deflated 36%) 2025-12-04T12:52:20.7317703Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ad4182055d9f07f8.xml (deflated 36%) 2025-12-04T12:52:20.7318414Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-b572842ca37510d6.xml (deflated 36%) 2025-12-04T12:52:20.7319133Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-0b1a30e997ca6431.xml (deflated 36%) 2025-12-04T12:52:20.7319889Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e562e66ae42d90cb.xml (deflated 36%) 2025-12-04T12:52:20.7320599Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-e753f34d0efc412f.xml (deflated 36%) 2025-12-04T12:52:20.7321319Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-3e869ce9df9ec961.xml (deflated 36%) 2025-12-04T12:52:20.7322043Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2b82b1f0b5e5f8cd.xml (deflated 36%) 2025-12-04T12:52:20.7322760Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a8c407980a31ffc0.xml (deflated 37%) 2025-12-04T12:52:20.7323476Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-2efe6b4638116b91.xml (deflated 36%) 2025-12-04T12:52:20.7324192Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-98c90d11be4a4494.xml (deflated 36%) 2025-12-04T12:52:20.7324931Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-4931d5d769ccdcd0.xml (deflated 36%) 2025-12-04T12:52:20.7325639Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-05476064036aedd6.xml (deflated 36%) 2025-12-04T12:52:20.7326354Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-8e55f782fe295c55.xml (deflated 37%) 2025-12-04T12:52:20.7327058Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-fed8717c328843a6.xml (deflated 37%) 2025-12-04T12:52:20.7327778Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-de76ea5d9fe707d8.xml (deflated 36%) 2025-12-04T12:52:20.7328486Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-0f7e482c95e1e619.xml (deflated 36%) 2025-12-04T12:52:20.7329276Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7ab8d1b1bfba2dc7.xml (deflated 36%) 2025-12-04T12:52:20.7329985Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-79d029624b0186af.xml (deflated 36%) 2025-12-04T12:52:20.7330696Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-7ef38c3a629f7d6a.xml (deflated 36%) 2025-12-04T12:52:20.7331416Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-a3eb91c932561739.xml (deflated 36%) 2025-12-04T12:52:20.7332126Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-5c4ed4bc28536a40.xml (deflated 36%) 2025-12-04T12:52:20.7332846Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-4d50dcf186234952.xml (deflated 36%) 2025-12-04T12:52:20.7333628Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-ce507d7407dcfdcd.xml (deflated 36%) 2025-12-04T12:52:20.7334598Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-fb440b5504373386.xml (deflated 36%) 2025-12-04T12:52:20.7335381Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-170bafadd5b2b85e.xml (deflated 38%) 2025-12-04T12:52:20.7336144Z adding: test/test-reports/python-pytest/distributed.rpc.cuda.test_tensorpipe_agent/distributed.rpc.cuda.test_tensorpipe_agent-58170fe4322a80c7.xml (deflated 56%) 2025-12-04T12:52:20.7336955Z adding: test/test-reports/python-pytest/distributed.optim.test_zero_redundancy_optimizer/distributed.optim.test_zero_redundancy_optimizer-541994707c39cee5.xml (deflated 92%) 2025-12-04T12:52:20.7337521Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08577689f3d858a6.xml (deflated 34%) 2025-12-04T12:52:20.7338095Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c743ab120f5be65e.xml (deflated 36%) 2025-12-04T12:52:20.7338648Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a323813051995ff.xml (deflated 36%) 2025-12-04T12:52:20.7339224Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c255dbdc27d4d77.xml (deflated 36%) 2025-12-04T12:52:20.7339787Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9f1202b2ef2d0e8.xml (deflated 35%) 2025-12-04T12:52:20.7340374Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8281b785ed89747f.xml (deflated 35%) 2025-12-04T12:52:20.7340954Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de1cfac33910faa2.xml (deflated 35%) 2025-12-04T12:52:20.7341509Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-864ab6c46117080c.xml (deflated 36%) 2025-12-04T12:52:20.7342076Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76784aced4c97984.xml (deflated 35%) 2025-12-04T12:52:20.7342645Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-703de7f7dca9caca.xml (deflated 36%) 2025-12-04T12:52:20.7343202Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0adaa780135147c4.xml (deflated 36%) 2025-12-04T12:52:20.7343771Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9b446c4fb19b7fa9.xml (deflated 35%) 2025-12-04T12:52:20.7344338Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a7525f86ad26c33d.xml (deflated 36%) 2025-12-04T12:52:20.7344962Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6753a042db8c209.xml (deflated 35%) 2025-12-04T12:52:20.7345521Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2349534806fd7876.xml (deflated 36%) 2025-12-04T12:52:20.7346174Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-42ee5035490db9e3.xml (deflated 35%) 2025-12-04T12:52:20.7346726Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-50ae93e5877e6267.xml (deflated 35%) 2025-12-04T12:52:20.7347266Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a89286bdf69beec6.xml (deflated 36%) 2025-12-04T12:52:20.7347822Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe0b73bb8411ca0.xml (deflated 36%) 2025-12-04T12:52:20.7348370Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b626fd6cfef7d3a.xml (deflated 36%) 2025-12-04T12:52:20.7348906Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8330e64091d76cd1.xml (deflated 35%) 2025-12-04T12:52:20.7349461Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c23dbc048e48feb8.xml (deflated 36%) 2025-12-04T12:52:20.7350030Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-89bb327f63e4e8bb.xml (deflated 35%) 2025-12-04T12:52:20.7350588Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9c8a9e0d041cedea.xml (deflated 36%) 2025-12-04T12:52:20.7351136Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-42322dfcd604c1a2.xml (deflated 47%) 2025-12-04T12:52:20.7351675Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-05269c810dd53a0a.xml (deflated 35%) 2025-12-04T12:52:20.7352227Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-496a22975dc71079.xml (deflated 35%) 2025-12-04T12:52:20.7352769Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-63ac84cde87e453b.xml (deflated 35%) 2025-12-04T12:52:20.7353329Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4f81974b0076c9ff.xml (deflated 37%) 2025-12-04T12:52:20.7353869Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6030e8ef08b1e09a.xml (deflated 35%) 2025-12-04T12:52:20.7354414Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bd13602843a8b4fd.xml (deflated 36%) 2025-12-04T12:52:20.7354994Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9df479069693f235.xml (deflated 36%) 2025-12-04T12:52:20.7355542Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4af234e53b06ab6b.xml (deflated 36%) 2025-12-04T12:52:20.7356106Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cc6adfb19ea74de2.xml (deflated 36%) 2025-12-04T12:52:20.7356651Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f1e4f77d9fcfd90.xml (deflated 36%) 2025-12-04T12:52:20.7357216Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-27f8ee33cfcf037f.xml (deflated 36%) 2025-12-04T12:52:20.7357762Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cdeff74b74265a35.xml (deflated 36%) 2025-12-04T12:52:20.7358303Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a9993b41f082320c.xml (deflated 39%) 2025-12-04T12:52:20.7358911Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3652407d28f2c215.xml (deflated 36%) 2025-12-04T12:52:20.7359456Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3dba64b9a4678ed.xml (deflated 37%) 2025-12-04T12:52:20.7360007Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3a5b171659e4eecd.xml (deflated 37%) 2025-12-04T12:52:20.7360552Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c138db9f1cd0e29e.xml (deflated 37%) 2025-12-04T12:52:20.7361093Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-113748ca2c5988c5.xml (deflated 44%) 2025-12-04T12:52:20.7361646Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9d127482d5c0a15f.xml (deflated 36%) 2025-12-04T12:52:20.7362187Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c6be5601f204f6a.xml (deflated 35%) 2025-12-04T12:52:20.7362750Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f818599ff0d45c17.xml (deflated 36%) 2025-12-04T12:52:20.7363294Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6285dc53d0288723.xml (deflated 36%) 2025-12-04T12:52:20.7363866Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-13ada6365eaa3764.xml (deflated 36%) 2025-12-04T12:52:20.7364419Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a8eee205d4f58a65.xml (deflated 36%) 2025-12-04T12:52:20.7364958Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-88abaaf5c04af9c6.xml (deflated 36%) 2025-12-04T12:52:20.7365515Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a90decd7629fa5.xml (deflated 34%) 2025-12-04T12:52:20.7366059Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9d96b951f094adb8.xml (deflated 34%) 2025-12-04T12:52:20.7366602Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-21b096edd5ee1b6a.xml (deflated 34%) 2025-12-04T12:52:20.7367154Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a0b61d4860845bc5.xml (deflated 34%) 2025-12-04T12:52:20.7367701Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0dbdae2b70ece5b8.xml (deflated 34%) 2025-12-04T12:52:20.7368252Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-af13cf2684481c71.xml (deflated 34%) 2025-12-04T12:52:20.7368848Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d92a42ccc718652.xml (deflated 34%) 2025-12-04T12:52:20.7369403Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3c2b86f8fd4d9656.xml (deflated 34%) 2025-12-04T12:52:20.7369950Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5a0ef3ecf28b8b71.xml (deflated 34%) 2025-12-04T12:52:20.7370503Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-dde6a545db9a9d8a.xml (deflated 34%) 2025-12-04T12:52:20.7371047Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5facf21e5ab1561.xml (deflated 34%) 2025-12-04T12:52:20.7371589Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8fe4943e76f9400e.xml (deflated 34%) 2025-12-04T12:52:20.7372144Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1c6197817aebc3a0.xml (deflated 34%) 2025-12-04T12:52:20.7372689Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aec658dbe3b6bd3e.xml (deflated 34%) 2025-12-04T12:52:20.7373375Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eda16941220f320e.xml (deflated 34%) 2025-12-04T12:52:20.7374103Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e4305ddff77c0f5f.xml (deflated 35%) 2025-12-04T12:52:20.7374663Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0e3a131c8f0d73a7.xml (deflated 34%) 2025-12-04T12:52:20.7375234Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1ba7f15a8c05fd39.xml (deflated 34%) 2025-12-04T12:52:20.7375795Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-af2c578cafcc8d63.xml (deflated 34%) 2025-12-04T12:52:20.7376361Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-578bb605dc4ba552.xml (deflated 34%) 2025-12-04T12:52:20.7376924Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b86217cb65e9b710.xml (deflated 47%) 2025-12-04T12:52:20.7377484Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f2854c8345d5fc1c.xml (deflated 34%) 2025-12-04T12:52:20.7378049Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c653d1bf5ed2c0fa.xml (deflated 34%) 2025-12-04T12:52:20.7378808Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-55cd7c7ba73bcdc2.xml (deflated 34%) 2025-12-04T12:52:20.7379397Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ffbbe82dbea62111.xml (deflated 34%) 2025-12-04T12:52:20.7379963Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-538ab2b49097f28c.xml (deflated 35%) 2025-12-04T12:52:20.7380529Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fe265780a95359c3.xml (deflated 35%) 2025-12-04T12:52:20.7381108Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-faa48c97f4df83cb.xml (deflated 34%) 2025-12-04T12:52:20.7381672Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c0cd36b746cb3623.xml (deflated 36%) 2025-12-04T12:52:20.7382244Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3b1171fb862d722e.xml (deflated 35%) 2025-12-04T12:52:20.7382808Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-df4dd3eb2cdaa291.xml (deflated 35%) 2025-12-04T12:52:20.7383384Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-dd69f8dcdadc9fb4.xml (deflated 35%) 2025-12-04T12:52:20.7383988Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f54bdf9dde5bf97e.xml (deflated 35%) 2025-12-04T12:52:20.7384549Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-83c9fcb3057380d5.xml (deflated 35%) 2025-12-04T12:52:20.7385116Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e826f0e43397861e.xml (deflated 35%) 2025-12-04T12:52:20.7385678Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e73673b2d41bd335.xml (deflated 34%) 2025-12-04T12:52:20.7386245Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1456794f12812e80.xml (deflated 35%) 2025-12-04T12:52:20.7386810Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e4e8c70c34dfdfe1.xml (deflated 35%) 2025-12-04T12:52:20.7387370Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-be166d073ca55795.xml (deflated 35%) 2025-12-04T12:52:20.7388003Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c49bebe8dd8e46f5.xml (deflated 36%) 2025-12-04T12:52:20.7388564Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c0f82bf827258b5a.xml (deflated 35%) 2025-12-04T12:52:20.7389131Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-983f78b8ab7f02b9.xml (deflated 34%) 2025-12-04T12:52:20.7389697Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1ceb5c06515b90e6.xml (deflated 35%) 2025-12-04T12:52:20.7390260Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8871c4d5afa11da1.xml (deflated 35%) 2025-12-04T12:52:20.7390925Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-005dd80a1d165bd6.xml (deflated 35%) 2025-12-04T12:52:20.7391469Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b8dfb4b667064c84.xml (deflated 35%) 2025-12-04T12:52:20.7392022Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-279f10471d7d2185.xml (deflated 35%) 2025-12-04T12:52:20.7392557Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2732174555722247.xml (deflated 35%) 2025-12-04T12:52:20.7393131Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-39eff7f371737bec.xml (deflated 35%) 2025-12-04T12:52:20.7393689Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1bd2ca8c9ccb5d1d.xml (deflated 35%) 2025-12-04T12:52:20.7394235Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6de6dc4ebcbeafe7.xml (deflated 35%) 2025-12-04T12:52:20.7394791Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef1db87d402e61e.xml (deflated 35%) 2025-12-04T12:52:20.7395336Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-07b0bb118974fac5.xml (deflated 35%) 2025-12-04T12:52:20.7395878Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b45dd1b51a454859.xml (deflated 34%) 2025-12-04T12:52:20.7396429Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c5c7f8af22555084.xml (deflated 35%) 2025-12-04T12:52:20.7396978Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d5f0a9727dbb8c1.xml (deflated 35%) 2025-12-04T12:52:20.7397533Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-efc5df0b6f603c8e.xml (deflated 36%) 2025-12-04T12:52:20.7398103Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3947ad663b8a0a5.xml (deflated 36%) 2025-12-04T12:52:20.7398650Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-9a8e98b254282d7c.xml (deflated 36%) 2025-12-04T12:52:20.7399194Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6f2a37a539841f4a.xml (deflated 36%) 2025-12-04T12:52:20.7399737Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-11da4acc57c01e3e.xml (deflated 36%) 2025-12-04T12:52:20.7400294Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-efa00c53d9ffafd0.xml (deflated 34%) 2025-12-04T12:52:20.7400842Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c139de0f468bfbd0.xml (deflated 34%) 2025-12-04T12:52:20.7401391Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-508c49e985b343e0.xml (deflated 35%) 2025-12-04T12:52:20.7401935Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-85c75a6ef966eda2.xml (deflated 34%) 2025-12-04T12:52:20.7402541Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d3b5821ebfa1d2b9.xml (deflated 34%) 2025-12-04T12:52:20.7403093Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-78c3c6f081c06510.xml (deflated 34%) 2025-12-04T12:52:20.7403642Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eceb8537f545c10b.xml (deflated 35%) 2025-12-04T12:52:20.7404195Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-997f8b36df0838da.xml (deflated 35%) 2025-12-04T12:52:20.7404750Z adding: test/test-reports/python-pytest/distributed.test_launcher/distributed.test_launcher-ab711efd5b5eae9c.xml (deflated 40%) 2025-12-04T12:52:20.7405272Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4402e4ca07679d5e.xml (deflated 36%) 2025-12-04T12:52:20.7405801Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-59586d5fa8d9df00.xml (deflated 36%) 2025-12-04T12:52:20.7406320Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-904d5bf6ccd1c7aa.xml (deflated 37%) 2025-12-04T12:52:20.7406841Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c074807310bb3c83.xml (deflated 40%) 2025-12-04T12:52:20.7407385Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8f207b69f7a673c5.xml (deflated 37%) 2025-12-04T12:52:20.7407897Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cb7ce4d5a847e19b.xml (deflated 36%) 2025-12-04T12:52:20.7408416Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-52257f6acad204d5.xml (deflated 37%) 2025-12-04T12:52:20.7408924Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b16442ffcab7dd38.xml (deflated 45%) 2025-12-04T12:52:20.7409447Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a12f788b611f7140.xml (deflated 45%) 2025-12-04T12:52:20.7409958Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-dfaa32f264045766.xml (deflated 45%) 2025-12-04T12:52:20.7410470Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8da17e1f6a078343.xml (deflated 45%) 2025-12-04T12:52:20.7410997Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-e8525e9dd27a79c3.xml (deflated 36%) 2025-12-04T12:52:20.7411508Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-6f41a4ce4013a7c2.xml (deflated 36%) 2025-12-04T12:52:20.7412059Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0470d4ac72d7a50e.xml (deflated 37%) 2025-12-04T12:52:20.7412574Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3a36dde009a45dc5.xml (deflated 37%) 2025-12-04T12:52:20.7413084Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-df81badd9797f785.xml (deflated 37%) 2025-12-04T12:52:20.7413663Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-350d6caf6618d2b5.xml (deflated 37%) 2025-12-04T12:52:20.7414371Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9a9af3abc1b0b41d.xml (deflated 37%) 2025-12-04T12:52:20.7414911Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-619ffbde41a10c5d.xml (deflated 37%) 2025-12-04T12:52:20.7415439Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c5cf03e47a405c4b.xml (deflated 37%) 2025-12-04T12:52:20.7415973Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-af6e2b8d803b9c4f.xml (deflated 37%) 2025-12-04T12:52:20.7416576Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97c2051ccee23a9b.xml (deflated 36%) 2025-12-04T12:52:20.7417106Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-482d04b678af0ece.xml (deflated 36%) 2025-12-04T12:52:20.7417642Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-10499bf9f759075b.xml (deflated 37%) 2025-12-04T12:52:20.7418170Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cccae4cdf350788c.xml (deflated 36%) 2025-12-04T12:52:20.7418694Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9241ea60ff5c054f.xml (deflated 36%) 2025-12-04T12:52:20.7419232Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c77cdf590d6c4d53.xml (deflated 37%) 2025-12-04T12:52:20.7419756Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9d86817452e27e08.xml (deflated 35%) 2025-12-04T12:52:20.7420297Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4b2a51af148732c1.xml (deflated 36%) 2025-12-04T12:52:20.7420823Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-36cc6892ce1d13b2.xml (deflated 36%) 2025-12-04T12:52:20.7421354Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f3e053be23766a80.xml (deflated 36%) 2025-12-04T12:52:20.7421926Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c4831c850757caf9.xml (deflated 35%) 2025-12-04T12:52:20.7422452Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-15d93d0dff93e000.xml (deflated 36%) 2025-12-04T12:52:20.7423037Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97c506133bdf82e2.xml (deflated 44%) 2025-12-04T12:52:20.7423576Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-760dbf0b3a7076aa.xml (deflated 44%) 2025-12-04T12:52:20.7424104Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0c82075f9025f767.xml (deflated 44%) 2025-12-04T12:52:20.7424629Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3e48901c9a873f45.xml (deflated 44%) 2025-12-04T12:52:20.7425164Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7a3e24041d2ef943.xml (deflated 35%) 2025-12-04T12:52:20.7425801Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-04f97f132e861e46.xml (deflated 36%) 2025-12-04T12:52:20.7426319Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-de7756f9c1641da6.xml (deflated 36%) 2025-12-04T12:52:20.7426870Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1bca8b988eb41bac.xml (deflated 36%) 2025-12-04T12:52:20.7427388Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-286102c30fda404f.xml (deflated 36%) 2025-12-04T12:52:20.7427915Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-bedd718db983bac0.xml (deflated 36%) 2025-12-04T12:52:20.7428428Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-eefe267c1e87f355.xml (deflated 36%) 2025-12-04T12:52:20.7428951Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-75d254d8d2b940a0.xml (deflated 40%) 2025-12-04T12:52:20.7429466Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d29ba7e8ccb2ecb1.xml (deflated 36%) 2025-12-04T12:52:20.7429984Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f94b0a13c28491ca.xml (deflated 36%) 2025-12-04T12:52:20.7430504Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d359118932c9b995.xml (deflated 36%) 2025-12-04T12:52:20.7431082Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f756c8323a1d09e8.xml (deflated 35%) 2025-12-04T12:52:20.7431605Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1240aa89fcaf1417.xml (deflated 34%) 2025-12-04T12:52:20.7432115Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-d2b8c5d98b2db0d3.xml (deflated 37%) 2025-12-04T12:52:20.7432629Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-f04571bca3f6577a.xml (deflated 36%) 2025-12-04T12:52:20.7433145Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-62a8c36072d028e3.xml (deflated 45%) 2025-12-04T12:52:20.7433653Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-085d0338122bdd88.xml (deflated 44%) 2025-12-04T12:52:20.7434174Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3aff210f96c86539.xml (deflated 44%) 2025-12-04T12:52:20.7434683Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-50126680f72685fd.xml (deflated 44%) 2025-12-04T12:52:20.7435197Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-265b5fcb7c5f4add.xml (deflated 37%) 2025-12-04T12:52:20.7435752Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-658bdaf47e5d9fc0.xml (deflated 35%) 2025-12-04T12:52:20.7436262Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9359cc0c923fd357.xml (deflated 35%) 2025-12-04T12:52:20.7436788Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8f02a426ff307186.xml (deflated 36%) 2025-12-04T12:52:20.7437302Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-724b43aaa3e86430.xml (deflated 35%) 2025-12-04T12:52:20.7437816Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-849142fee7d3fe7a.xml (deflated 35%) 2025-12-04T12:52:20.7438338Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-40597616ec98a508.xml (deflated 34%) 2025-12-04T12:52:20.7438852Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8de5caa0f44ab195.xml (deflated 35%) 2025-12-04T12:52:20.7439376Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-70a2189ada91e7b4.xml (deflated 35%) 2025-12-04T12:52:20.7439886Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c9872634fae2a2a2.xml (deflated 35%) 2025-12-04T12:52:20.7440392Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a2053711ae870746.xml (deflated 35%) 2025-12-04T12:52:20.7440944Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4ef2fc9fec34c264.xml (deflated 40%) 2025-12-04T12:52:20.7441461Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b4a7a6fe6b411ab3.xml (deflated 35%) 2025-12-04T12:52:20.7441972Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3a29202ead173617.xml (deflated 34%) 2025-12-04T12:52:20.7442483Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a61155d6f938b2cc.xml (deflated 34%) 2025-12-04T12:52:20.7442992Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9b41f59ee0cfeb75.xml (deflated 34%) 2025-12-04T12:52:20.7443506Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-307094c901db62b6.xml (deflated 34%) 2025-12-04T12:52:20.7444016Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c3332bb3687882a6.xml (deflated 35%) 2025-12-04T12:52:20.7444539Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-422a006275d6f6d2.xml (deflated 35%) 2025-12-04T12:52:20.7445104Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b842f6182997ffac.xml (deflated 35%) 2025-12-04T12:52:20.7445620Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b78401240c2392a0.xml (deflated 34%) 2025-12-04T12:52:20.7446136Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-05d0a7320ac2f2e5.xml (deflated 34%) 2025-12-04T12:52:20.7446651Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-56eae3d52dfef9e0.xml (deflated 34%) 2025-12-04T12:52:20.7447166Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9598d85b65a1dd25.xml (deflated 35%) 2025-12-04T12:52:20.7447680Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9433dc4d80f4e3fb.xml (deflated 34%) 2025-12-04T12:52:20.7448209Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-fbcb32de5a4aaa3b.xml (deflated 35%) 2025-12-04T12:52:20.7448724Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c2ec29ec8ed5fa00.xml (deflated 35%) 2025-12-04T12:52:20.7449236Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7fbbe4d1eb982186.xml (deflated 34%) 2025-12-04T12:52:20.7449785Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-fd140e21219ecfa7.xml (deflated 34%) 2025-12-04T12:52:20.7450299Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-db87db829f00dbc2.xml (deflated 36%) 2025-12-04T12:52:20.7450817Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-0645df0da2606eed.xml (deflated 36%) 2025-12-04T12:52:20.7451334Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-5be93906577b570a.xml (deflated 36%) 2025-12-04T12:52:20.7451845Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-cb158e2a6356a16b.xml (deflated 36%) 2025-12-04T12:52:20.7452362Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1efba08034268a13.xml (deflated 35%) 2025-12-04T12:52:20.7452866Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-1cfd040a8029b228.xml (deflated 35%) 2025-12-04T12:52:20.7453446Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-26b3b1f841eed644.xml (deflated 36%) 2025-12-04T12:52:20.7454128Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8363e71100168a00.xml (deflated 35%) 2025-12-04T12:52:20.7454691Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-449df693536afa26.xml (deflated 35%) 2025-12-04T12:52:20.7455226Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a65d4b001bf04cc5.xml (deflated 36%) 2025-12-04T12:52:20.7455754Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a841b70017fda049.xml (deflated 35%) 2025-12-04T12:52:20.7456292Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4cd5969ebb7a8971.xml (deflated 35%) 2025-12-04T12:52:20.7456815Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-8be9136d4926e94c.xml (deflated 35%) 2025-12-04T12:52:20.7457343Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-804b0113d0e9f4eb.xml (deflated 37%) 2025-12-04T12:52:20.7457878Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-03bdd58db5584705.xml (deflated 35%) 2025-12-04T12:52:20.7458409Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ebfe437a9c6a9ad8.xml (deflated 36%) 2025-12-04T12:52:20.7459036Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-a1970e4a95b6fcaa.xml (deflated 35%) 2025-12-04T12:52:20.7459566Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-e7b46a0191ac24db.xml (deflated 36%) 2025-12-04T12:52:20.7460099Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7c2efcfcaf566fcf.xml (deflated 35%) 2025-12-04T12:52:20.7460635Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-98f57491a6bbb62c.xml (deflated 35%) 2025-12-04T12:52:20.7461161Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4dd659a2a136b6f5.xml (deflated 35%) 2025-12-04T12:52:20.7461700Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ed9bd2c70267a528.xml (deflated 36%) 2025-12-04T12:52:20.7462222Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-2244d9322e930674.xml (deflated 35%) 2025-12-04T12:52:20.7462753Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c9423adcd896df81.xml (deflated 35%) 2025-12-04T12:52:20.7463286Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-627f0929637f665a.xml (deflated 35%) 2025-12-04T12:52:20.7463820Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-c04d647e93aaefae.xml (deflated 36%) 2025-12-04T12:52:20.7464387Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-91ec2ebad1950a1b.xml (deflated 35%) 2025-12-04T12:52:20.7464917Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-70b9e0e0ebbe8c78.xml (deflated 35%) 2025-12-04T12:52:20.7465445Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-64a3329e5489e4a1.xml (deflated 35%) 2025-12-04T12:52:20.7466075Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-286648de75ab2791.xml (deflated 36%) 2025-12-04T12:52:20.7466592Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-aba6b9e907975cf2.xml (deflated 35%) 2025-12-04T12:52:20.7467110Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-9858e278543d5c8a.xml (deflated 36%) 2025-12-04T12:52:20.7467618Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-7b426550a7173d4d.xml (deflated 36%) 2025-12-04T12:52:20.7468129Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-687616f36cc07fc7.xml (deflated 36%) 2025-12-04T12:52:20.7468643Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-97db8e5668905dba.xml (deflated 35%) 2025-12-04T12:52:20.7469179Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-ad88807c8d637ecf.xml (deflated 36%) 2025-12-04T12:52:20.7469696Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-b4bc3ece8958b620.xml (deflated 35%) 2025-12-04T12:52:20.7470214Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-dad34641aa2ec6a8.xml (deflated 36%) 2025-12-04T12:52:20.7470721Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-4581fd35a1a9c062.xml (deflated 33%) 2025-12-04T12:52:20.7471239Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-3bb2e8b2b3b8f504.xml (deflated 33%) 2025-12-04T12:52:20.7471742Z adding: test/test-reports/python-pytest/distributed.test_store/distributed.test_store-21612ac6bc46612d.xml (deflated 35%) 2025-12-04T12:52:20.7472290Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-424c7796b1d4da37.xml (deflated 35%) 2025-12-04T12:52:20.7472837Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7bb4f0d3928e2ed2.xml (deflated 43%) 2025-12-04T12:52:20.7473438Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-33318bd0fe5ba50d.xml (deflated 34%) 2025-12-04T12:52:20.7473980Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-40835271667093d0.xml (deflated 34%) 2025-12-04T12:52:20.7474518Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c5cf88214d708090.xml (deflated 34%) 2025-12-04T12:52:20.7475070Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fe7d871fae3e5a4d.xml (deflated 34%) 2025-12-04T12:52:20.7475615Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2f486ca4ce8f2e5.xml (deflated 35%) 2025-12-04T12:52:20.7476159Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2514e248fb19a6f7.xml (deflated 35%) 2025-12-04T12:52:20.7476713Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5fddd299999d2001.xml (deflated 36%) 2025-12-04T12:52:20.7477255Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-526fccf11a728f1b.xml (deflated 35%) 2025-12-04T12:52:20.7477809Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ae02a87d7dee25c6.xml (deflated 36%) 2025-12-04T12:52:20.7478379Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-92c41cc069cc4d37.xml (deflated 34%) 2025-12-04T12:52:20.7479052Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8ae44307030d30dc.xml (deflated 57%) 2025-12-04T12:52:20.7479785Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9417cbc175c56634.xml (deflated 34%) 2025-12-04T12:52:20.7480347Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-14e447da1762fe9d.xml (deflated 34%) 2025-12-04T12:52:20.7480925Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f1da8f898c8da1fd.xml (deflated 34%) 2025-12-04T12:52:20.7481486Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-60a2acbc27c8df8e.xml (deflated 34%) 2025-12-04T12:52:20.7482056Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9cce25ecd2936a9e.xml (deflated 34%) 2025-12-04T12:52:20.7482615Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7981bf6fc2476012.xml (deflated 34%) 2025-12-04T12:52:20.7483170Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-68308ed30b7c8249.xml (deflated 35%) 2025-12-04T12:52:20.7483820Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7d82f8c0c3f71993.xml (deflated 36%) 2025-12-04T12:52:20.7484383Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bc89766ea636312a.xml (deflated 36%) 2025-12-04T12:52:20.7484948Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-14132ee63704b359.xml (deflated 37%) 2025-12-04T12:52:20.7485513Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d030ce3f22be0dac.xml (deflated 36%) 2025-12-04T12:52:20.7486076Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-08e08a8b7a3c9688.xml (deflated 36%) 2025-12-04T12:52:20.7486643Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-04413b57dd0bd1bb.xml (deflated 36%) 2025-12-04T12:52:20.7487202Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ede7c35169d447e5.xml (deflated 37%) 2025-12-04T12:52:20.7487767Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-03d41d77810a93c2.xml (deflated 37%) 2025-12-04T12:52:20.7488398Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e5d454fc87664e79.xml (deflated 37%) 2025-12-04T12:52:20.7488962Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9b5d1be0cdc61898.xml (deflated 36%) 2025-12-04T12:52:20.7489537Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cb65338c3c68b015.xml (deflated 36%) 2025-12-04T12:52:20.7490104Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ea9d6160bdcb14ea.xml (deflated 36%) 2025-12-04T12:52:20.7490664Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54b6e103bc500b82.xml (deflated 36%) 2025-12-04T12:52:20.7491335Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-43cbd90ab1e74433.xml (deflated 36%) 2025-12-04T12:52:20.7491879Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e1a5076677cc9040.xml (deflated 35%) 2025-12-04T12:52:20.7492435Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5b874423f18e9e6f.xml (deflated 36%) 2025-12-04T12:52:20.7492976Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-42a99cc8097c27cf.xml (deflated 37%) 2025-12-04T12:52:20.7493632Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff575ff252e1ccc1.xml (deflated 36%) 2025-12-04T12:52:20.7494361Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8f0aac570bbe3c22.xml (deflated 36%) 2025-12-04T12:52:20.7494926Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e46e2d43687e8c9.xml (deflated 37%) 2025-12-04T12:52:20.7495492Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8983151d5ee422e9.xml (deflated 44%) 2025-12-04T12:52:20.7496053Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-21007923fa28eb94.xml (deflated 38%) 2025-12-04T12:52:20.7496619Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6d199e27a3f20ba.xml (deflated 37%) 2025-12-04T12:52:20.7497178Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-333908f1f4e63432.xml (deflated 38%) 2025-12-04T12:52:20.7497734Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-10f6223b6a5799e8.xml (deflated 37%) 2025-12-04T12:52:20.7498339Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0488ec6f0d084de4.xml (deflated 37%) 2025-12-04T12:52:20.7498903Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d0fe3ffcefb63fed.xml (deflated 36%) 2025-12-04T12:52:20.7499477Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-aede6c2e8b0a576d.xml (deflated 35%) 2025-12-04T12:52:20.7500033Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a402d736ec805725.xml (deflated 35%) 2025-12-04T12:52:20.7500605Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3d885be53a1fd8f7.xml (deflated 35%) 2025-12-04T12:52:20.7501159Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e68bb841a642474c.xml (deflated 35%) 2025-12-04T12:52:20.7501718Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f1f78f076f606689.xml (deflated 44%) 2025-12-04T12:52:20.7502288Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-31755bde92c246ad.xml (deflated 36%) 2025-12-04T12:52:20.7502899Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-954c6b44d604c3c5.xml (deflated 35%) 2025-12-04T12:52:20.7503474Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-edecd7ac68a78e0c.xml (deflated 36%) 2025-12-04T12:52:20.7504041Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1b2e7a904a2bde2c.xml (deflated 37%) 2025-12-04T12:52:20.7504606Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8e8e63e24b22fdf3.xml (deflated 36%) 2025-12-04T12:52:20.7505175Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cac81d05bb378553.xml (deflated 36%) 2025-12-04T12:52:20.7505842Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-44879a601c328a3c.xml (deflated 36%) 2025-12-04T12:52:20.7506395Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fbb8b0d505262428.xml (deflated 36%) 2025-12-04T12:52:20.7506936Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0417ada3fc299914.xml (deflated 36%) 2025-12-04T12:52:20.7507475Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6ec92773f760b446.xml (deflated 33%) 2025-12-04T12:52:20.7508057Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-27748f3e1c06d895.xml (deflated 48%) 2025-12-04T12:52:20.7508594Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5b8215da1349e52d.xml (deflated 34%) 2025-12-04T12:52:20.7509203Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-97e205e1ddd96df1.xml (deflated 35%) 2025-12-04T12:52:20.7509751Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8ef86c99e3cf3225.xml (deflated 35%) 2025-12-04T12:52:20.7510292Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6a001d92e92403d5.xml (deflated 36%) 2025-12-04T12:52:20.7510827Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1548f33eb44681e7.xml (deflated 36%) 2025-12-04T12:52:20.7511378Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-85e41dba7cec0336.xml (deflated 35%) 2025-12-04T12:52:20.7511920Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c9aa6140346d940e.xml (deflated 35%) 2025-12-04T12:52:20.7512470Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-677b6de586c48a10.xml (deflated 36%) 2025-12-04T12:52:20.7513038Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-129a903d843c6506.xml (deflated 35%) 2025-12-04T12:52:20.7513586Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a8e0d0b35114e966.xml (deflated 35%) 2025-12-04T12:52:20.7514133Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-00b8787048f81655.xml (deflated 36%) 2025-12-04T12:52:20.7514666Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9749a48df2369fa2.xml (deflated 35%) 2025-12-04T12:52:20.7515209Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7b61630e604191fa.xml (deflated 35%) 2025-12-04T12:52:20.7515756Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-babd79b2628f20c6.xml (deflated 34%) 2025-12-04T12:52:20.7516297Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c321a873e0094d80.xml (deflated 35%) 2025-12-04T12:52:20.7516902Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9d5bf7025c6dd7a3.xml (deflated 34%) 2025-12-04T12:52:20.7517454Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6fdafbad56912bb4.xml (deflated 35%) 2025-12-04T12:52:20.7518009Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fac34b62cd6cb996.xml (deflated 34%) 2025-12-04T12:52:20.7518550Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-18c9f847c7541135.xml (deflated 34%) 2025-12-04T12:52:20.7519098Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-aa0ccd92b1be664c.xml (deflated 34%) 2025-12-04T12:52:20.7519651Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-42a2588e5ba30eef.xml (deflated 35%) 2025-12-04T12:52:20.7520192Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-6de735cb6d76f6db.xml (deflated 35%) 2025-12-04T12:52:20.7520750Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b9c0c81401e31d8c.xml (deflated 34%) 2025-12-04T12:52:20.7521284Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-554a092032e2d568.xml (deflated 35%) 2025-12-04T12:52:20.7521838Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1bdd7c0fac046180.xml (deflated 35%) 2025-12-04T12:52:20.7522406Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b30b289e682ac666.xml (deflated 36%) 2025-12-04T12:52:20.7522952Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-db0ee76b53b2ddf3.xml (deflated 34%) 2025-12-04T12:52:20.7523507Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4fbe3fc8cf0cb3d0.xml (deflated 42%) 2025-12-04T12:52:20.7524165Z adding: test/test-reports/python-pytest/distributed.elastic.events.lib_test/distributed.elastic.events.lib_test-07a790705f8742f5.xml (deflated 76%) 2025-12-04T12:52:20.7524835Z adding: test/test-reports/python-pytest/distributed.elastic.metrics.api_test/distributed.elastic.metrics.api_test-089696776a609d56.xml (deflated 60%) 2025-12-04T12:52:20.7525601Z adding: test/test-reports/python-pytest/distributed.elastic.timer.local_timer_example/distributed.elastic.timer.local_timer_example-2bef7f019a87a08a.xml (deflated 49%) 2025-12-04T12:52:20.7526325Z adding: test/test-reports/python-pytest/distributed.elastic.timer.local_timer_test/distributed.elastic.timer.local_timer_test-7292bedd4140d1cb.xml (deflated 81%) 2025-12-04T12:52:20.7527082Z adding: test/test-reports/python-pytest/distributed.elastic.utils.distributed_test/distributed.elastic.utils.distributed_test-37b4dd92e3796470.xml (deflated 78%) 2025-12-04T12:52:20.7527812Z adding: test/test-reports/python-pytest/distributed.elastic.utils.logging_test/distributed.elastic.utils.logging_test-ad00506eaa0f6b8e.xml (deflated 51%) 2025-12-04T12:52:20.7528482Z adding: test/test-reports/python-pytest/distributed.elastic.utils.util_test/distributed.elastic.utils.util_test-06e2f9e323fc3569.xml (deflated 81%) 2025-12-04T12:52:20.7529131Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-30657b6825f5a9b9.xml (deflated 28%) 2025-12-04T12:52:20.7529775Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2659630a9052ba32.xml (deflated 28%) 2025-12-04T12:52:20.7530426Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35db7c52cb4e4963.xml (deflated 28%) 2025-12-04T12:52:20.7531070Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d045df6f00832674.xml (deflated 28%) 2025-12-04T12:52:20.7531783Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2e7babf23a98fec.xml (deflated 28%) 2025-12-04T12:52:20.7532431Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce1d4513008f30b5.xml (deflated 28%) 2025-12-04T12:52:20.7533085Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-386d51a44811a37c.xml (deflated 28%) 2025-12-04T12:52:20.7533964Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a4c267fd423ef4fb.xml (deflated 28%) 2025-12-04T12:52:20.7534632Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422923c4a3ff9000.xml (deflated 28%) 2025-12-04T12:52:20.7535314Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-869317da10a8a2cd.xml (deflated 28%) 2025-12-04T12:52:20.7535992Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-02900a63a52fefe8.xml (deflated 28%) 2025-12-04T12:52:20.7536669Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-20776a9c6c63a20c.xml (deflated 28%) 2025-12-04T12:52:20.7537393Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9bfba6a18599d81e.xml (deflated 28%) 2025-12-04T12:52:20.7538075Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89a2aaebca19d4e0.xml (deflated 28%) 2025-12-04T12:52:20.7538764Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7bfcf341dfacd6a.xml (deflated 28%) 2025-12-04T12:52:20.7539451Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cefb54cda6fbbd60.xml (deflated 28%) 2025-12-04T12:52:20.7540138Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1caa1cefc24b5566.xml (deflated 28%) 2025-12-04T12:52:20.7540817Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6f460f0a05f7b70.xml (deflated 28%) 2025-12-04T12:52:20.7541492Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3757b46df7c65698.xml (deflated 44%) 2025-12-04T12:52:20.7542167Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-222eea109d5ddc53.xml (deflated 43%) 2025-12-04T12:52:20.7542879Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4fbe4fa34b470fa8.xml (deflated 44%) 2025-12-04T12:52:20.7543560Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37a4634b8fad8ea.xml (deflated 43%) 2025-12-04T12:52:20.7544230Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4758d32b2c1097e5.xml (deflated 43%) 2025-12-04T12:52:20.7544919Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f048625fe3cca682.xml (deflated 44%) 2025-12-04T12:52:20.7545591Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4dbf8960b326e96e.xml (deflated 44%) 2025-12-04T12:52:20.7546367Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43e82ed2dcb87d3f.xml (deflated 43%) 2025-12-04T12:52:20.7547033Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fb482cd3d23f5d7.xml (deflated 43%) 2025-12-04T12:52:20.7547738Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4860aca019c3141.xml (deflated 43%) 2025-12-04T12:52:20.7548396Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065ee64125ced44a.xml (deflated 41%) 2025-12-04T12:52:20.7549048Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7bccf121b18dac1f.xml (deflated 43%) 2025-12-04T12:52:20.7549704Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0b49c26d87da987.xml (deflated 43%) 2025-12-04T12:52:20.7550359Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3693e8a7c975c90a.xml (deflated 36%) 2025-12-04T12:52:20.7551016Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fe1b63bffb2c6a3e.xml (deflated 35%) 2025-12-04T12:52:20.7551679Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c99c8b7e71138e8.xml (deflated 36%) 2025-12-04T12:52:20.7552327Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-989a5c6b3d6a9965.xml (deflated 36%) 2025-12-04T12:52:20.7553025Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6823ac8a1907d169.xml (deflated 35%) 2025-12-04T12:52:20.7553678Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ac5bb86a9b9f7fe.xml (deflated 35%) 2025-12-04T12:52:20.7554322Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32f1715c008a3066.xml (deflated 36%) 2025-12-04T12:52:20.7554983Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70519aa5531a3696.xml (deflated 35%) 2025-12-04T12:52:20.7555631Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5dfee2b8fb7ba7cc.xml (deflated 36%) 2025-12-04T12:52:20.7556290Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3caf33f6f4116297.xml (deflated 43%) 2025-12-04T12:52:20.7556941Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-087aca976836f603.xml (deflated 36%) 2025-12-04T12:52:20.7557600Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9841b7f077764a91.xml (deflated 36%) 2025-12-04T12:52:20.7558285Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c090e23a7f2fffdb.xml (deflated 36%) 2025-12-04T12:52:20.7558931Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5459ba8e137555e.xml (deflated 35%) 2025-12-04T12:52:20.7559592Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-80a7f4a66302355b.xml (deflated 35%) 2025-12-04T12:52:20.7560251Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2d6cbfeff36754e.xml (deflated 44%) 2025-12-04T12:52:20.7560912Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f874e7939adea357.xml (deflated 45%) 2025-12-04T12:52:20.7561563Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7fc3f50bae53c154.xml (deflated 36%) 2025-12-04T12:52:20.7562259Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4f854973242c798.xml (deflated 35%) 2025-12-04T12:52:20.7562918Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-567d6dba79efc754.xml (deflated 56%) 2025-12-04T12:52:20.7563566Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32dae883b8f409b4.xml (deflated 37%) 2025-12-04T12:52:20.7564229Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f35b4f146cba39ed.xml (deflated 43%) 2025-12-04T12:52:20.7564883Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11d385bda710106b.xml (deflated 46%) 2025-12-04T12:52:20.7565537Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16f3fe014f6f2b90.xml (deflated 35%) 2025-12-04T12:52:20.7566188Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-314d187f8a4f1527.xml (deflated 43%) 2025-12-04T12:52:20.7566836Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-37e0f31b5d0eb001.xml (deflated 36%) 2025-12-04T12:52:20.7567526Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e98f2eef47033.xml (deflated 36%) 2025-12-04T12:52:20.7568181Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0373e508dc2d66c0.xml (deflated 36%) 2025-12-04T12:52:20.7568842Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2844fa97ba5d525c.xml (deflated 36%) 2025-12-04T12:52:20.7569494Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f42a82f6cff1f523.xml (deflated 56%) 2025-12-04T12:52:20.7570153Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-668dafa0f73e7876.xml (deflated 37%) 2025-12-04T12:52:20.7570811Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bb4dfdf3a617cc5.xml (deflated 37%) 2025-12-04T12:52:20.7571463Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec275c22b294744.xml (deflated 36%) 2025-12-04T12:52:20.7572123Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4d848c3599ac1349.xml (deflated 36%) 2025-12-04T12:52:20.7572799Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca68035b34ae5b62.xml (deflated 35%) 2025-12-04T12:52:20.7573523Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33f84bea341754fa.xml (deflated 36%) 2025-12-04T12:52:20.7574363Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-57896abf4e71b1aa.xml (deflated 36%) 2025-12-04T12:52:20.7575035Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ce0111ddde5df94.xml (deflated 36%) 2025-12-04T12:52:20.7575713Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-75ed533594a3939c.xml (deflated 36%) 2025-12-04T12:52:20.7576387Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58dadb18f5c0dbb.xml (deflated 35%) 2025-12-04T12:52:20.7577072Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53f849c69fe671b8.xml (deflated 43%) 2025-12-04T12:52:20.7577803Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f8ce5171d9832f1.xml (deflated 43%) 2025-12-04T12:52:20.7578476Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-caca2e381afec0ac.xml (deflated 36%) 2025-12-04T12:52:20.7579328Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-19e4e102157e6e47.xml (deflated 43%) 2025-12-04T12:52:20.7580002Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-426e6a6be0a16d20.xml (deflated 43%) 2025-12-04T12:52:20.7580684Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f27afe7768298281.xml (deflated 43%) 2025-12-04T12:52:20.7581354Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1742254d60a81949.xml (deflated 45%) 2025-12-04T12:52:20.7582032Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-731fd65d812688e2.xml (deflated 45%) 2025-12-04T12:52:20.7582705Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb1039971f7213e1.xml (deflated 35%) 2025-12-04T12:52:20.7583412Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7b2ab90303eda495.xml (deflated 46%) 2025-12-04T12:52:20.7584088Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15ad2d4900971da8.xml (deflated 43%) 2025-12-04T12:52:20.7584784Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bcbe99e24fef6d4.xml (deflated 44%) 2025-12-04T12:52:20.7585477Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc5956b0c1d4a301.xml (deflated 44%) 2025-12-04T12:52:20.7586146Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4b951168e9fd206.xml (deflated 43%) 2025-12-04T12:52:20.7586816Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-34586d84d9e8b574.xml (deflated 35%) 2025-12-04T12:52:20.7587488Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-346f56e4dab409e9.xml (deflated 36%) 2025-12-04T12:52:20.7588158Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86debbf241b646f7.xml (deflated 36%) 2025-12-04T12:52:20.7588869Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82deea2afca27b05.xml (deflated 36%) 2025-12-04T12:52:20.7589541Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68e170c9c9399262.xml (deflated 44%) 2025-12-04T12:52:20.7590217Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cb408a515a3366a.xml (deflated 35%) 2025-12-04T12:52:20.7590983Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-753ec666c5362e2c.xml (deflated 35%) 2025-12-04T12:52:20.7591629Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6738b863174a943.xml (deflated 36%) 2025-12-04T12:52:20.7592292Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0285e1ff7eb488da.xml (deflated 36%) 2025-12-04T12:52:20.7593019Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-10c387b6e730ad52.xml (deflated 36%) 2025-12-04T12:52:20.7593678Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffaa3570c6b46738.xml (deflated 36%) 2025-12-04T12:52:20.7594333Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d71dba8115f24b8.xml (deflated 43%) 2025-12-04T12:52:20.7594993Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8b4b6bac082ff37.xml (deflated 35%) 2025-12-04T12:52:20.7595652Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2c322ea7f3ebbd2.xml (deflated 44%) 2025-12-04T12:52:20.7596298Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f03e83663e9a4601.xml (deflated 36%) 2025-12-04T12:52:20.7596957Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b48b6a282b4704f.xml (deflated 43%) 2025-12-04T12:52:20.7597707Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cfe16ef8c24bdf2.xml (deflated 43%) 2025-12-04T12:52:20.7598380Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53f3ffdc1525a0fa.xml (deflated 36%) 2025-12-04T12:52:20.7599006Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4005fef94f6d8aae.xml (deflated 43%) 2025-12-04T12:52:20.7599637Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6e8960fc72004342.xml (deflated 35%) 2025-12-04T12:52:20.7600283Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2756d0ca7e2d02f0.xml (deflated 46%) 2025-12-04T12:52:20.7600917Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-badcb6057cbf1c54.xml (deflated 45%) 2025-12-04T12:52:20.7601554Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95a798b363d84da6.xml (deflated 35%) 2025-12-04T12:52:20.7602189Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e0a914bf0d30c722.xml (deflated 44%) 2025-12-04T12:52:20.7602827Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0504f34b27fc1bbf.xml (deflated 43%) 2025-12-04T12:52:20.7603495Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-870b2179518ff22e.xml (deflated 44%) 2025-12-04T12:52:20.7604134Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-555ef90b62bf7ebe.xml (deflated 43%) 2025-12-04T12:52:20.7604777Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef83d3f03358d4cd.xml (deflated 43%) 2025-12-04T12:52:20.7605410Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-782ee14ea9598dfa.xml (deflated 44%) 2025-12-04T12:52:20.7606055Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-206e0f4991cecb09.xml (deflated 45%) 2025-12-04T12:52:20.7606697Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1dfcc212a7b4183f.xml (deflated 43%) 2025-12-04T12:52:20.7607336Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-58fc1504b82abecc.xml (deflated 44%) 2025-12-04T12:52:20.7608031Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bfe9fe1a7f1a6c.xml (deflated 43%) 2025-12-04T12:52:20.7608676Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-442ba0fa5d15173c.xml (deflated 41%) 2025-12-04T12:52:20.7609320Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-316e66014d1168cc.xml (deflated 43%) 2025-12-04T12:52:20.7609959Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-055618da6a57a7c3.xml (deflated 43%) 2025-12-04T12:52:20.7610594Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59c00404add2d546.xml (deflated 36%) 2025-12-04T12:52:20.7611238Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4cf6377210f7d08.xml (deflated 35%) 2025-12-04T12:52:20.7611871Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd879e6321523d4a.xml (deflated 36%) 2025-12-04T12:52:20.7612515Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11a49a1cb92bd713.xml (deflated 36%) 2025-12-04T12:52:20.7613176Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4d5ef940e3b8d7d9.xml (deflated 35%) 2025-12-04T12:52:20.7614052Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8d88460db673001.xml (deflated 35%) 2025-12-04T12:52:20.7614729Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7312243f5e57fab8.xml (deflated 37%) 2025-12-04T12:52:20.7615413Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d24b6142b7eafb02.xml (deflated 36%) 2025-12-04T12:52:20.7616101Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ddaf535a31194a8.xml (deflated 35%) 2025-12-04T12:52:20.7616778Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1941bd1a84ade60.xml (deflated 43%) 2025-12-04T12:52:20.7617462Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b4c97a3c97a6540e.xml (deflated 36%) 2025-12-04T12:52:20.7618185Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a4f9a9719cac0e27.xml (deflated 36%) 2025-12-04T12:52:20.7618876Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffbbd0582ae0ed08.xml (deflated 35%) 2025-12-04T12:52:20.7619566Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8fceb5aff8b3d919.xml (deflated 35%) 2025-12-04T12:52:20.7620242Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4991d9e148183f8a.xml (deflated 36%) 2025-12-04T12:52:20.7620931Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2f340536cc04891.xml (deflated 44%) 2025-12-04T12:52:20.7621605Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d54b5f82a473b96e.xml (deflated 45%) 2025-12-04T12:52:20.7622295Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-32d70eb5fbea7de4.xml (deflated 36%) 2025-12-04T12:52:20.7623023Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-20eb72c43e1aae78.xml (deflated 35%) 2025-12-04T12:52:20.7623701Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b3c54de3705157f.xml (deflated 56%) 2025-12-04T12:52:20.7624385Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-708887f203c7efd9.xml (deflated 37%) 2025-12-04T12:52:20.7625060Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-308288814df1046f.xml (deflated 43%) 2025-12-04T12:52:20.7625847Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a9b36875380651b.xml (deflated 46%) 2025-12-04T12:52:20.7626604Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b640778d73caa6a8.xml (deflated 35%) 2025-12-04T12:52:20.7627244Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb7fed148c64c9a0.xml (deflated 43%) 2025-12-04T12:52:20.7627884Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-73e1340249d39849.xml (deflated 35%) 2025-12-04T12:52:20.7628554Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b53b6883ca9eced.xml (deflated 36%) 2025-12-04T12:52:20.7629200Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-588a66c8cc282acc.xml (deflated 36%) 2025-12-04T12:52:20.7629837Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e2b4e294105fc8d.xml (deflated 36%) 2025-12-04T12:52:20.7630495Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1876d2067da9cba3.xml (deflated 56%) 2025-12-04T12:52:20.7631134Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c0f841dc2b4efb2.xml (deflated 36%) 2025-12-04T12:52:20.7631772Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a51aa631f2d9c9e.xml (deflated 36%) 2025-12-04T12:52:20.7632423Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-36fdf430bc89743e.xml (deflated 37%) 2025-12-04T12:52:20.7633056Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e97389ff3c3fed34.xml (deflated 36%) 2025-12-04T12:52:20.7633729Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7431cb10fe427793.xml (deflated 36%) 2025-12-04T12:52:20.7634370Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300aa7c2334a8d69.xml (deflated 36%) 2025-12-04T12:52:20.7635020Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59ebecc4d72f0d7e.xml (deflated 37%) 2025-12-04T12:52:20.7635676Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cd9ca1fa3ae4b5b.xml (deflated 37%) 2025-12-04T12:52:20.7636310Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f733a13fa18eefb.xml (deflated 36%) 2025-12-04T12:52:20.7636960Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ba067e3c1dbc48f6.xml (deflated 35%) 2025-12-04T12:52:20.7637648Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5982d44c9781d305.xml (deflated 43%) 2025-12-04T12:52:20.7638293Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d3cdd5c6558c74f8.xml (deflated 43%) 2025-12-04T12:52:20.7638932Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9abbb37614212e4e.xml (deflated 36%) 2025-12-04T12:52:20.7639563Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0b61128e53994e8.xml (deflated 43%) 2025-12-04T12:52:20.7640215Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-efe3817fcb01a456.xml (deflated 43%) 2025-12-04T12:52:20.7640852Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-406d537981e1a2b6.xml (deflated 43%) 2025-12-04T12:52:20.7641497Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c93bf90c971a4a89.xml (deflated 45%) 2025-12-04T12:52:20.7642134Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-63e194c9a12582ed.xml (deflated 45%) 2025-12-04T12:52:20.7642771Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e97ed0330e85ad8d.xml (deflated 35%) 2025-12-04T12:52:20.7643623Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a7b490db4a9340e.xml (deflated 46%) 2025-12-04T12:52:20.7644263Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4a0629607c3fa5fd.xml (deflated 43%) 2025-12-04T12:52:20.7644918Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5dbb3e619be4e12b.xml (deflated 44%) 2025-12-04T12:52:20.7645558Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ded592e1a5a858e0.xml (deflated 44%) 2025-12-04T12:52:20.7646207Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bb87080fbf758d2.xml (deflated 43%) 2025-12-04T12:52:20.7646848Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e5898fa75a7f3ef.xml (deflated 35%) 2025-12-04T12:52:20.7647486Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b16de0301895bd76.xml (deflated 36%) 2025-12-04T12:52:20.7648163Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2cc94a088d38104.xml (deflated 36%) 2025-12-04T12:52:20.7648806Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c2b712ac61ede43d.xml (deflated 36%) 2025-12-04T12:52:20.7649452Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc30b3da1a421347.xml (deflated 44%) 2025-12-04T12:52:20.7650086Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9fe2cef34c40d5eb.xml (deflated 35%) 2025-12-04T12:52:20.7650734Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c658fa95ccd2cf6a.xml (deflated 36%) 2025-12-04T12:52:20.7651377Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-19a881494eafddb6.xml (deflated 36%) 2025-12-04T12:52:20.7652019Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95816b9cbefdfdc1.xml (deflated 35%) 2025-12-04T12:52:20.7652726Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2148875896839649.xml (deflated 36%) 2025-12-04T12:52:20.7653454Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0127aecf34ab538.xml (deflated 36%) 2025-12-04T12:52:20.7654296Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0d23338c1bb516d.xml (deflated 43%) 2025-12-04T12:52:20.7654975Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8fc91952c534b6f9.xml (deflated 35%) 2025-12-04T12:52:20.7655655Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-335bee712b3a4821.xml (deflated 44%) 2025-12-04T12:52:20.7656347Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1537c74fba16ec9.xml (deflated 36%) 2025-12-04T12:52:20.7657024Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f23e70d3f89edf0.xml (deflated 43%) 2025-12-04T12:52:20.7657704Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148183526f50032e.xml (deflated 43%) 2025-12-04T12:52:20.7658417Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8190f13f98ccf625.xml (deflated 36%) 2025-12-04T12:52:20.7659102Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d1fbcdbdafd7eb07.xml (deflated 43%) 2025-12-04T12:52:20.7659793Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8debf7e3f937383c.xml (deflated 35%) 2025-12-04T12:52:20.7660479Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42a201482bdd75d2.xml (deflated 46%) 2025-12-04T12:52:20.7661169Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9c4803d3c82177.xml (deflated 45%) 2025-12-04T12:52:20.7661899Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70ca556779e39720.xml (deflated 35%) 2025-12-04T12:52:20.7662574Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-690ee27b5383666e.xml (deflated 36%) 2025-12-04T12:52:20.7663257Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b61a79c2ba69c2fc.xml (deflated 44%) 2025-12-04T12:52:20.7663954Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23785637709aa6cb.xml (deflated 43%) 2025-12-04T12:52:20.7664624Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c2a0306c654054e.xml (deflated 36%) 2025-12-04T12:52:20.7665305Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c55b916a52817402.xml (deflated 43%) 2025-12-04T12:52:20.7666180Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9018f81b0b2988f3.xml (deflated 45%) 2025-12-04T12:52:20.7666779Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f11b61d4653bdb7.xml (deflated 44%) 2025-12-04T12:52:20.7667373Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b135b7c4b7778b0.xml (deflated 44%) 2025-12-04T12:52:20.7668020Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fce459c37c59c7e.xml (deflated 36%) 2025-12-04T12:52:20.7668617Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-54617e16ec702700.xml (deflated 44%) 2025-12-04T12:52:20.7669218Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b550c1faec52346.xml (deflated 36%) 2025-12-04T12:52:20.7669818Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e030e348c4a08d77.xml (deflated 42%) 2025-12-04T12:52:20.7670418Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f3321c7d9da11433.xml (deflated 36%) 2025-12-04T12:52:20.7671017Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-84298f366cb09c62.xml (deflated 36%) 2025-12-04T12:52:20.7671609Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6de764c5b54fbe28.xml (deflated 35%) 2025-12-04T12:52:20.7672202Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a94cb9872a07c07.xml (deflated 36%) 2025-12-04T12:52:20.7672826Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42095590a7a2fe72.xml (deflated 36%) 2025-12-04T12:52:20.7673417Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a90373c5366804c7.xml (deflated 35%) 2025-12-04T12:52:20.7674027Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-557572c546cf01dc.xml (deflated 35%) 2025-12-04T12:52:20.7674624Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a95186e1de85626.xml (deflated 36%) 2025-12-04T12:52:20.7675222Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96a6fe2e34c367be.xml (deflated 36%) 2025-12-04T12:52:20.7675826Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ac9393c6a20ceccf.xml (deflated 35%) 2025-12-04T12:52:20.7676418Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76abd5633e378237.xml (deflated 35%) 2025-12-04T12:52:20.7677023Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a41c806c0c2532bc.xml (deflated 45%) 2025-12-04T12:52:20.7677643Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0fea3b9528c40c04.xml (deflated 45%) 2025-12-04T12:52:20.7678253Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87084c93c4099c6c.xml (deflated 35%) 2025-12-04T12:52:20.7678973Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4521c7b0bbbf49e6.xml (deflated 35%) 2025-12-04T12:52:20.7679806Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2ae149566d0f8ed8.xml (deflated 35%) 2025-12-04T12:52:20.7680504Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ddafbffca1b4a2a2.xml (deflated 36%) 2025-12-04T12:52:20.7681173Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-495169b72e129e3c.xml (deflated 35%) 2025-12-04T12:52:20.7681863Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edaf36e92a9dab8d.xml (deflated 37%) 2025-12-04T12:52:20.7682634Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1f0e24e21eda42b.xml (deflated 35%) 2025-12-04T12:52:20.7683304Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b16af08a6bda590.xml (deflated 56%) 2025-12-04T12:52:20.7683987Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d719be89195db636.xml (deflated 37%) 2025-12-04T12:52:20.7684655Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53d64dcee99de118.xml (deflated 36%) 2025-12-04T12:52:20.7685342Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea452bf92699d1d7.xml (deflated 36%) 2025-12-04T12:52:20.7686017Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0d1b37af5093171.xml (deflated 45%) 2025-12-04T12:52:20.7686693Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a15143db106f15f7.xml (deflated 44%) 2025-12-04T12:52:20.7687368Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf396dfb3b4b2bb4.xml (deflated 42%) 2025-12-04T12:52:20.7688072Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bb35081df35448e3.xml (deflated 42%) 2025-12-04T12:52:20.7688754Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35b1275cb510efb5.xml (deflated 36%) 2025-12-04T12:52:20.7689423Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e9367509f991dbb3.xml (deflated 36%) 2025-12-04T12:52:20.7690105Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bd7d011a01059ce.xml (deflated 56%) 2025-12-04T12:52:20.7690779Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3dc760c16caaf0c2.xml (deflated 36%) 2025-12-04T12:52:20.7691640Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-77a48723971bcff5.xml (deflated 36%) 2025-12-04T12:52:20.7692252Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db5717cebce92ef6.xml (deflated 36%) 2025-12-04T12:52:20.7692854Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8eeacffd41be6e39.xml (deflated 36%) 2025-12-04T12:52:20.7693562Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebed908b36444f09.xml (deflated 35%) 2025-12-04T12:52:20.7694447Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ef96dc46d436334.xml (deflated 36%) 2025-12-04T12:52:20.7695126Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ab7428bbadf34c3f.xml (deflated 45%) 2025-12-04T12:52:20.7695815Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c50e50b95ac4e028.xml (deflated 36%) 2025-12-04T12:52:20.7696532Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a38147e023273460.xml (deflated 36%) 2025-12-04T12:52:20.7697312Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95d5c860934b2c1e.xml (deflated 35%) 2025-12-04T12:52:20.7698091Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-25a3f0e33205a56a.xml (deflated 35%) 2025-12-04T12:52:20.7698852Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eb201dfe29a7ff2.xml (deflated 36%) 2025-12-04T12:52:20.7699567Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50b8421dbc1badac.xml (deflated 43%) 2025-12-04T12:52:20.7700438Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-11a93d269a83940f.xml (deflated 36%) 2025-12-04T12:52:20.7701181Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-674b848f32484d1e.xml (deflated 36%) 2025-12-04T12:52:20.7701946Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4121a01708986f89.xml (deflated 35%) 2025-12-04T12:52:20.7702667Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-994ce7e474295a12.xml (deflated 36%) 2025-12-04T12:52:20.7703429Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee469c21c041c7a4.xml (deflated 35%) 2025-12-04T12:52:20.7704149Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3246d443991c34d3.xml (deflated 36%) 2025-12-04T12:52:20.7704920Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d96eaef94da7418.xml (deflated 43%) 2025-12-04T12:52:20.7705709Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3dacf547cad6a67d.xml (deflated 36%) 2025-12-04T12:52:20.7706479Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3616b6f968436ee1.xml (deflated 43%) 2025-12-04T12:52:20.7707157Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-622c1eaa2fb92c9d.xml (deflated 36%) 2025-12-04T12:52:20.7707787Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-306192510a63b544.xml (deflated 36%) 2025-12-04T12:52:20.7708481Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82095872e2babc1f.xml (deflated 35%) 2025-12-04T12:52:20.7709132Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c84a9abadf4e6456.xml (deflated 43%) 2025-12-04T12:52:20.7709794Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9893ace33f3830.xml (deflated 45%) 2025-12-04T12:52:20.7710479Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-daefb987452748c4.xml (deflated 36%) 2025-12-04T12:52:20.7711109Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3092859278d7bcb6.xml (deflated 36%) 2025-12-04T12:52:20.7711774Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f10308d532e69d5.xml (deflated 36%) 2025-12-04T12:52:20.7712435Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6ce015a4b2872362.xml (deflated 36%) 2025-12-04T12:52:20.7713086Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-224e5072b13b6f72.xml (deflated 36%) 2025-12-04T12:52:20.7713769Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-722d50f6db17be0a.xml (deflated 35%) 2025-12-04T12:52:20.7714463Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3132a5f3cbdc8d40.xml (deflated 36%) 2025-12-04T12:52:20.7715137Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c2952957ec5e4941.xml (deflated 36%) 2025-12-04T12:52:20.7715750Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-088131fd2e1eb740.xml (deflated 36%) 2025-12-04T12:52:20.7716474Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e35a0f5543ab7ba.xml (deflated 36%) 2025-12-04T12:52:20.7717110Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cec746d21c82a41d.xml (deflated 43%) 2025-12-04T12:52:20.7717909Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33aec040946b9fff.xml (deflated 36%) 2025-12-04T12:52:20.7718640Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6714c1f1957114b9.xml (deflated 36%) 2025-12-04T12:52:20.7719314Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5fb352bbe9e8c7.xml (deflated 44%) 2025-12-04T12:52:20.7730417Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-308f137ee98c9d9b.xml (deflated 46%) 2025-12-04T12:52:20.7731170Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5cf923d1d132c738.xml (deflated 36%) 2025-12-04T12:52:20.7731839Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e9017a66c842280.xml (deflated 43%) 2025-12-04T12:52:20.7732497Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52008c5414012f53.xml (deflated 37%) 2025-12-04T12:52:20.7733149Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-707c869a600b32c4.xml (deflated 36%) 2025-12-04T12:52:20.7734090Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9afcc52ba56cb7cd.xml (deflated 35%) 2025-12-04T12:52:20.7734782Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c38c0e5b27dc65d.xml (deflated 36%) 2025-12-04T12:52:20.7735483Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aaba21dbf0cfec28.xml (deflated 43%) 2025-12-04T12:52:20.7736255Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-01e27bee5aad4037.xml (deflated 44%) 2025-12-04T12:52:20.7736942Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d7295af7e1928.xml (deflated 36%) 2025-12-04T12:52:20.7737628Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-520313bd404147de.xml (deflated 43%) 2025-12-04T12:52:20.7738307Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a797af99437b20e.xml (deflated 45%) 2025-12-04T12:52:20.7738997Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-37d6ef52a05b7d44.xml (deflated 44%) 2025-12-04T12:52:20.7739677Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2315e451ee36d9f1.xml (deflated 45%) 2025-12-04T12:52:20.7740440Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ecd0d14f056dd21.xml (deflated 35%) 2025-12-04T12:52:20.7741123Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1beae1b9515a1c25.xml (deflated 43%) 2025-12-04T12:52:20.7741803Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d4acbddf3c9607.xml (deflated 35%) 2025-12-04T12:52:20.7742488Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c1a59c2fbd776179.xml (deflated 42%) 2025-12-04T12:52:20.7743168Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-31b3b5953cdc94eb.xml (deflated 36%) 2025-12-04T12:52:20.7743859Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0c79eda599deb55f.xml (deflated 36%) 2025-12-04T12:52:20.7744539Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f54745fb5e614031.xml (deflated 35%) 2025-12-04T12:52:20.7745213Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8182c37c5ee80d72.xml (deflated 36%) 2025-12-04T12:52:20.7746051Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ab8cc14caa3fed1d.xml (deflated 36%) 2025-12-04T12:52:20.7746690Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ffbdb3959954615b.xml (deflated 35%) 2025-12-04T12:52:20.7747339Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cef388e39996bffb.xml (deflated 35%) 2025-12-04T12:52:20.7748066Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-969953b595bcabd5.xml (deflated 36%) 2025-12-04T12:52:20.7748675Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f0371af0c88c192.xml (deflated 36%) 2025-12-04T12:52:20.7749273Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-06587ef8e124e088.xml (deflated 36%) 2025-12-04T12:52:20.7749880Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85f349019e7692ee.xml (deflated 36%) 2025-12-04T12:52:20.7750497Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7da4ea5ccfcc936b.xml (deflated 45%) 2025-12-04T12:52:20.7751125Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0583cccd09b9e0c9.xml (deflated 45%) 2025-12-04T12:52:20.7751735Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-940ee9ff608e1e76.xml (deflated 35%) 2025-12-04T12:52:20.7752337Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-048493e6df7d5c5f.xml (deflated 35%) 2025-12-04T12:52:20.7752937Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f94739f99a45e5b.xml (deflated 35%) 2025-12-04T12:52:20.7753548Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5d366f728b719f9d.xml (deflated 36%) 2025-12-04T12:52:20.7754148Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1bd04d92534f968b.xml (deflated 36%) 2025-12-04T12:52:20.7754762Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b2b4bdf6b2f4c521.xml (deflated 36%) 2025-12-04T12:52:20.7755411Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50ac6d99d2e163f0.xml (deflated 35%) 2025-12-04T12:52:20.7756017Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-98ebf2381cda380a.xml (deflated 56%) 2025-12-04T12:52:20.7756620Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-02a17fffec5d92e9.xml (deflated 36%) 2025-12-04T12:52:20.7757227Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fa4be36d0f0fec.xml (deflated 36%) 2025-12-04T12:52:20.7757834Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b20781e7e3b68cd0.xml (deflated 36%) 2025-12-04T12:52:20.7758437Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-055ede09a7156885.xml (deflated 45%) 2025-12-04T12:52:20.7759041Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2dc318e9e7472b33.xml (deflated 44%) 2025-12-04T12:52:20.7759643Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6d75222f185b4d5.xml (deflated 42%) 2025-12-04T12:52:20.7760272Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c6a79d484e935ea.xml (deflated 42%) 2025-12-04T12:52:20.7760876Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-723363f561b2e381.xml (deflated 36%) 2025-12-04T12:52:20.7761481Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfa70be2e3e2e968.xml (deflated 36%) 2025-12-04T12:52:20.7762094Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c34de188341d059e.xml (deflated 56%) 2025-12-04T12:52:20.7762699Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3f32df0d2792ec.xml (deflated 36%) 2025-12-04T12:52:20.7763314Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae835e0e1dbfe3ec.xml (deflated 37%) 2025-12-04T12:52:20.7763916Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41f83edcff84215a.xml (deflated 36%) 2025-12-04T12:52:20.7764543Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a860decd20224034.xml (deflated 36%) 2025-12-04T12:52:20.7765153Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3963d6f58e4804f4.xml (deflated 35%) 2025-12-04T12:52:20.7765753Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b2893112b7e4a0fd.xml (deflated 36%) 2025-12-04T12:52:20.7766356Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ae1d3298b84e247.xml (deflated 45%) 2025-12-04T12:52:20.7766960Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce25bc0bb1b2125d.xml (deflated 37%) 2025-12-04T12:52:20.7767570Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b733f78747bdf8ab.xml (deflated 36%) 2025-12-04T12:52:20.7768175Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f13a1853be14fff2.xml (deflated 35%) 2025-12-04T12:52:20.7768873Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e0c8a225205b2b1b.xml (deflated 35%) 2025-12-04T12:52:20.7769479Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e42484012944fc8.xml (deflated 36%) 2025-12-04T12:52:20.7770076Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76186faceb0b0d55.xml (deflated 43%) 2025-12-04T12:52:20.7770686Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a9b9276398f16f1.xml (deflated 36%) 2025-12-04T12:52:20.7771287Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c78e67ea2df6209d.xml (deflated 36%) 2025-12-04T12:52:20.7771890Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7358e80ff8c5c8ba.xml (deflated 35%) 2025-12-04T12:52:20.7772501Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b10a6c1a1360dd8.xml (deflated 36%) 2025-12-04T12:52:20.7773101Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cf0679388a3449b.xml (deflated 36%) 2025-12-04T12:52:20.7773973Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-26580fbd1387c903.xml (deflated 35%) 2025-12-04T12:52:20.7774650Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5db097ce06fd3ba.xml (deflated 43%) 2025-12-04T12:52:20.7775337Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5f966f935cdcc510.xml (deflated 36%) 2025-12-04T12:52:20.7776017Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4a30815173ad737.xml (deflated 43%) 2025-12-04T12:52:20.7776695Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41db9b895b480d7b.xml (deflated 36%) 2025-12-04T12:52:20.7777377Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78a881039e14b5b2.xml (deflated 36%) 2025-12-04T12:52:20.7778057Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd62c820cf2113b8.xml (deflated 36%) 2025-12-04T12:52:20.7778943Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c61700ab79aac30.xml (deflated 43%) 2025-12-04T12:52:20.7779668Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae9f946365ef7f7.xml (deflated 45%) 2025-12-04T12:52:20.7780351Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d551f94bc57d8aff.xml (deflated 36%) 2025-12-04T12:52:20.7781040Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dcae987a61c4873a.xml (deflated 36%) 2025-12-04T12:52:20.7781718Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca6886c3086a09a1.xml (deflated 36%) 2025-12-04T12:52:20.7782407Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aa0548b0a0f16fd5.xml (deflated 36%) 2025-12-04T12:52:20.7783082Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b837bfe0a5661087.xml (deflated 35%) 2025-12-04T12:52:20.7783837Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42e3e45e9107b083.xml (deflated 36%) 2025-12-04T12:52:20.7784509Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2e64285e29d874e0.xml (deflated 36%) 2025-12-04T12:52:20.7785187Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4a385e35494820f4.xml (deflated 36%) 2025-12-04T12:52:20.7785878Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9cc203997a753f09.xml (deflated 36%) 2025-12-04T12:52:20.7786559Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0913cee8f07dc0af.xml (deflated 35%) 2025-12-04T12:52:20.7787250Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5fb0cf096433beb.xml (deflated 43%) 2025-12-04T12:52:20.7787930Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4d28c8a1a46915c.xml (deflated 36%) 2025-12-04T12:52:20.7788608Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b653d29a66c2470a.xml (deflated 35%) 2025-12-04T12:52:20.7789320Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e7285ab6b5c9527e.xml (deflated 43%) 2025-12-04T12:52:20.7789996Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a54941f41b8cfe8.xml (deflated 45%) 2025-12-04T12:52:20.7790781Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91e59cc2a798549a.xml (deflated 36%) 2025-12-04T12:52:20.7791381Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e72c23544464af54.xml (deflated 42%) 2025-12-04T12:52:20.7791991Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-45f80f9c137d75a5.xml (deflated 37%) 2025-12-04T12:52:20.7792591Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d93447ba16fc454.xml (deflated 36%) 2025-12-04T12:52:20.7793193Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24191bbc1a349edf.xml (deflated 36%) 2025-12-04T12:52:20.7812006Z ##[group]Run # Remove any previous usage logs if they exist 2025-12-04T12:52:20.7812194Z # Remove any previous usage logs if they exist 2025-12-04T12:52:20.7812344Z rm -f logs-*.zip 2025-12-04T12:52:20.7812543Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-12-04T12:52:20.7812769Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-12-04T12:52:20.7818639Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:20.7818731Z env: 2025-12-04T12:52:20.7818849Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:20.7818948Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:20.7819131Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:20.7819460Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:20.7819775Z FILE_SUFFIX: test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T12:52:20.7819878Z ##[endgroup] 2025-12-04T12:52:20.7881312Z adding: usage_log.txt (deflated 58%) 2025-12-04T12:52:20.7982782Z adding: test/test-reports/distributed.test_dynamo_distributed_1.2_812e4aab4ac83592_.log (deflated 89%) 2025-12-04T12:52:20.7998187Z adding: test/test-reports/distributed.fsdp.test_fsdp_apply_1.1_ffe46bf2b700541c_.log (deflated 95%) 2025-12-04T12:52:20.7998686Z adding: test/test-reports/distributed.elastic.utils.util_test_1.1_c7771d0b50d6c4b7_.log (deflated 75%) 2025-12-04T12:52:20.8004625Z adding: test/test-reports/distributed.fsdp.test_fsdp_multiple_wrapping_1.1_4a76a0d00df8da58_.log (deflated 95%) 2025-12-04T12:52:20.8023934Z adding: test/test-reports/distributed.fsdp.test_fsdp_fine_tune_1.1_200ce5473d48270d_.log (deflated 96%) 2025-12-04T12:52:20.8137050Z adding: test/test-reports/distributed.fsdp.test_fsdp_dtensor_state_dict_1.1_e652baa949161530_.log (deflated 97%) 2025-12-04T12:52:20.8403232Z adding: test/test-reports/distributed.fsdp.test_fsdp_core_1.2_d577d9d07b48d18d_.log (deflated 96%) 2025-12-04T12:52:20.8427812Z adding: test/test-reports/distributed.test_c10d_nccl_1.3_dda30713f52ab06d_.log (deflated 92%) 2025-12-04T12:52:20.8428407Z adding: test/test-reports/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks_1.1_60114f563500ace9_.log (deflated 78%) 2025-12-04T12:52:20.8428813Z adding: test/test-reports/distributed.tensor.test_op_schema_1.1_d011062119cfcbab_.log (deflated 58%) 2025-12-04T12:52:20.8429256Z adding: test/test-reports/distributed.checkpoint.test_nested_dict_1.1_997e08555820f5c0_.log (deflated 58%) 2025-12-04T12:52:20.8429674Z adding: test/test-reports/distributed.elastic.metrics.api_test_1.1_6dbe93286f341ee5_.log (deflated 62%) 2025-12-04T12:52:20.8430180Z adding: test/test-reports/distributed.checkpoint.test_consolidate_hf_safetensors_1.1_58298622ce8a96a1_.log (deflated 84%) 2025-12-04T12:52:20.8430819Z adding: test/test-reports/distributed.checkpoint._experimental.test_barriers_1.1_b9cec98a3229522e_.log (deflated 60%) 2025-12-04T12:52:20.8431254Z adding: test/test-reports/distributed.pipelining.test_transformer_1.1_f2f1238de8a1b675_.log (deflated 54%) 2025-12-04T12:52:20.8432028Z adding: test/test-reports/distributed.flight_recorder.test_fr_analysis_1.1_e16708747b4c5449_.log (deflated 64%) 2025-12-04T12:52:20.8432653Z adding: test/test-reports/distributed._composable.test_contract_1.1_5025aa03747f10c7_.log (deflated 66%) 2025-12-04T12:52:20.8433383Z adding: test/test-reports/distributed.checkpoint.test_dedup_tensors_1.1_0e84935618c63fcf_.log (deflated 52%) 2025-12-04T12:52:20.8433832Z adding: test/test-reports/distributed.pipelining.test_pipe_1.1_59bc2fd17ffed9fc_.log (deflated 61%) 2025-12-04T12:52:20.8434544Z adding: test/test-reports/distributed.pipelining.test_backward_1.1_0bf2315297ac9ba7_.log (deflated 69%) 2025-12-04T12:52:20.8436069Z adding: test/test-reports/distributed.test_nvshmem_triton_1.1_f614bcc3c29a51a4_.log (deflated 89%) 2025-12-04T12:52:20.8461477Z adding: test/test-reports/distributed.tensor.test_dtensor_1.1_8ef92beff0d7f6af_.log (deflated 95%) 2025-12-04T12:52:20.8461861Z adding: test/test-reports/distributed.test_p2p_ipc_1.1_b1c3fbf05590da79_.log (deflated 51%) 2025-12-04T12:52:20.8462627Z adding: test/test-reports/distributed.tensor.test_common_rules_1.1_14dc10d88c04ca47_.log (deflated 74%) 2025-12-04T12:52:20.8464159Z adding: test/test-reports/distributed.checkpoint.test_hf_safetensor_e2e_1.1_5f8a368983958374_.log (deflated 83%) 2025-12-04T12:52:20.8465193Z adding: test/test-reports/distributed.tensor.test_dynamic_1.1_6f95f3474f81ab92_.log (deflated 73%) 2025-12-04T12:52:20.8466251Z adding: test/test-reports/distributed.checkpoint.e2e.test_fsdp_ep_1.1_f9c977572fffaad5_.log (deflated 72%) 2025-12-04T12:52:20.8467016Z adding: test/test-reports/distributed.pipelining.test_unflatten_1.1_d494e6526239495e_.log (deflated 54%) 2025-12-04T12:52:20.8468798Z adding: test/test-reports/distributed.tensor.test_dtensor_testbase_1.1_125742ee6f314706_.log (deflated 92%) 2025-12-04T12:52:20.8472593Z adding: test/test-reports/distributed.tensor.test_redistribute_1.2_4b7d9ba5bb6931ec_.log (deflated 90%) 2025-12-04T12:52:20.8474372Z adding: test/test-reports/distributed.test_nvshmem_1.1_42a48f707fc7fbe7_.log (deflated 89%) 2025-12-04T12:52:20.8475737Z adding: test/test-reports/distributed.tensor.test_attention_1.1_cb19c8955a160060_.log (deflated 80%) 2025-12-04T12:52:20.8477083Z adding: test/test-reports/distributed.tensor.test_convolution_ops_1.1_49a43fa49632a258_.log (deflated 83%) 2025-12-04T12:52:20.8481251Z adding: test/test-reports/distributed.checkpoint.fsdp.test_fsdp_dsd_1.1_f8fe2a82b83f915c_.log (deflated 92%) 2025-12-04T12:52:20.8481736Z adding: test/test-reports/distributed.checkpoint.test_save_load_api_1.1_c6bb8dfab455cfa5_.log (deflated 63%) 2025-12-04T12:52:20.8483518Z adding: test/test-reports/distributed.tensor.debug.test_comm_mode_features_1.1_03fdf3eadd2ab611_.log (deflated 88%) 2025-12-04T12:52:20.8484060Z adding: test/test-reports/distributed.tensor.test_dtensor_ops_1.1_67309fc460535665_.log (deflated 51%) 2025-12-04T12:52:20.8484782Z adding: test/test-reports/distributed.test_debug_1.1_a56c6136c9a8abc4_.log (deflated 68%) 2025-12-04T12:52:20.8485747Z adding: test/test-reports/distributed.test_overlap_bucketing_unit_1.1_9dbb1a52f33d29c7_.log (deflated 76%) 2025-12-04T12:52:20.8486492Z adding: test/test-reports/distributed.elastic.events.lib_test_1.1_7071ab3e44d7ad6e_.log (deflated 72%) 2025-12-04T12:52:20.8487464Z adding: test/test-reports/distributed.checkpoint._experimental.test_checkpoint_writer_1.1_1dd840a86f907337_.log (deflated 76%) 2025-12-04T12:52:20.8487862Z adding: test/test-reports/distributed.elastic.timer.api_test_1.1_13380cf2203031af_.log (stored 0%) 2025-12-04T12:52:20.8488630Z adding: test/test-reports/distributed.checkpoint._experimental.test_checkpointer_1.1_7614bf1ed13f1d86_.log (deflated 79%) 2025-12-04T12:52:20.8489884Z adding: test/test-reports/distributed.tensor.test_init_1.1_e83e12e555837820_.log (deflated 83%) 2025-12-04T12:52:20.8490583Z adding: test/test-reports/distributed._composable.test_checkpoint_1.1_abf6f06f0264530a_.log (deflated 69%) 2025-12-04T12:52:20.8491642Z adding: test/test-reports/distributed._tools.test_fsdp2_mem_tracker_1.1_26e469db93fd8e16_.log (deflated 75%) 2025-12-04T12:52:20.8492512Z adding: test/test-reports/distributed.elastic.timer.local_timer_example_1.1_4b68dab99362c847_.log (deflated 68%) 2025-12-04T12:52:20.8493865Z adding: test/test-reports/distributed._composable.test_replicate_mixed_precision_1.1_01fba1f975bb97b6_.log (deflated 79%) 2025-12-04T12:52:20.8498541Z adding: test/test-reports/distributed.checkpoint.e2e.test_fine_tuning_1.1_0eddc041b8455dc1_.log (deflated 94%) 2025-12-04T12:52:20.8500996Z adding: test/test-reports/distributed.tensor.test_matrix_ops_1.1_08575986e567d5c0_.log (deflated 85%) 2025-12-04T12:52:20.8503390Z adding: test/test-reports/distributed.tensor.test_optimizers_1.1_7e5f1be8728dbdea_.log (deflated 89%) 2025-12-04T12:52:20.8506954Z adding: test/test-reports/distributed.test_symmetric_memory_1.1_64f49cd6a7ec957a_.log (deflated 92%) 2025-12-04T12:52:20.8507637Z adding: test/test-reports/distributed._tools.test_runtime_estimator_1.1_65959848b6aa0401_.log (deflated 59%) 2025-12-04T12:52:20.8508501Z adding: test/test-reports/distributed.elastic.timer.local_timer_test_1.1_d1c67eef8711d433_.log (deflated 77%) 2025-12-04T12:52:20.8510258Z adding: test/test-reports/distributed._composable.test_replicate_with_compiler_1.1_eea3e5fa3a5dfc0a_.log (deflated 80%) 2025-12-04T12:52:20.8511493Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_autograd_1.1_8ee07b7d86524ab5_.log (deflated 79%) 2025-12-04T12:52:20.8529214Z adding: test/test-reports/distributed.test_store_1.1_9c10cd7a4c8d4bdb_.log (deflated 95%) 2025-12-04T12:52:20.8535903Z adding: test/test-reports/distributed._composable.test_composability.test_2d_composability_1.1_35a52faaad7dc617_.log (deflated 93%) 2025-12-04T12:52:20.8595788Z adding: test/test-reports/distributed.fsdp.test_fsdp_optim_state_1.1_7d67a9c100256545_.log (deflated 98%) 2025-12-04T12:52:20.8596247Z adding: test/test-reports/distributed.test_c10d_logger_1.1_0a52c9bc30c9920f_.log (deflated 65%) 2025-12-04T12:52:20.8597521Z adding: test/test-reports/distributed._composable.test_replicate_training_1.1_cc04659fdefd418f_.log (deflated 86%) 2025-12-04T12:52:20.8597993Z adding: test/test-reports/distributed.optim.test_apply_optimizer_in_backward_1.1_13991a8c54f44830_.log (stored 0%) 2025-12-04T12:52:20.8598550Z adding: test/test-reports/distributed.rpc.test_share_memory_1.1_b0f1b7712293917c_.log (deflated 58%) 2025-12-04T12:52:20.8600503Z adding: test/test-reports/distributed.tensor.test_op_strategy_1.1_1fa2e02695839b56_.log (deflated 87%) 2025-12-04T12:52:20.8602919Z adding: test/test-reports/distributed.fsdp.test_fsdp_grad_acc_1.1_4abfd3a00d3824ee_.log (deflated 90%) 2025-12-04T12:52:20.8605778Z adding: test/test-reports/distributed.checkpoint.test_state_dict_stager_1.1_039a3cb2334b1e56_.log (deflated 92%) 2025-12-04T12:52:20.8611727Z adding: test/test-reports/distributed.fsdp.test_fsdp_freezing_weights_1.1_1d6984042c8f2cfc_.log (deflated 94%) 2025-12-04T12:52:20.8614030Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_init_1.1_313b8ba1dd39b14e_.log (deflated 86%) 2025-12-04T12:52:20.8616366Z adding: test/test-reports/distributed.fsdp.test_fsdp_flatten_params_1.1_076e3197ee747eb8_.log (deflated 83%) 2025-12-04T12:52:20.8616789Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_23c96a6f8ddde9df_.log (deflated 12%) 2025-12-04T12:52:20.8617212Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_e48186af63d3ea09_.log (deflated 12%) 2025-12-04T12:52:20.8617631Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_09a29604c4f1582f_.log (deflated 82%) 2025-12-04T12:52:20.8618382Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_cf9680aaa50be142_.log (deflated 82%) 2025-12-04T12:52:20.8626484Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_a127edfd2b1d71a7_.log (deflated 93%) 2025-12-04T12:52:20.8634010Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_e2f3a9d90cd48f40_.log (deflated 93%) 2025-12-04T12:52:20.8642656Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_e45ee060484710ae_.log (deflated 94%) 2025-12-04T12:52:20.8651286Z adding: test/test-reports/distributed.test_distributed_spawn_3.9_7d3f43b9506a343a_.log (deflated 94%) 2025-12-04T12:52:20.8651857Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_3bf78f0c79d16c4d_.log (deflated 12%) 2025-12-04T12:52:20.8652253Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_8c693fd2b3945c2d_.log (deflated 12%) 2025-12-04T12:52:20.8652657Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_a700db43a456c251_.log (deflated 82%) 2025-12-04T12:52:20.8653443Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_e8872495f8ff89db_.log (deflated 83%) 2025-12-04T12:52:20.8662369Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_dcec8ae9e33ccbd0_.log (deflated 92%) 2025-12-04T12:52:20.8670891Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_f7edb6118d7b5472_.log (deflated 92%) 2025-12-04T12:52:20.8680510Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_b2d6c55f75e2baa8_.log (deflated 93%) 2025-12-04T12:52:20.8690338Z adding: test/test-reports/distributed.test_distributed_spawn_6.9_85d7f75cf0274716_.log (deflated 93%) 2025-12-04T12:52:20.8690775Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_05322304b4268f45_.log (deflated 12%) 2025-12-04T12:52:20.8691293Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_e1495444ab48b4d4_.log (deflated 12%) 2025-12-04T12:52:20.8691951Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_9a435d3db962ff09_.log (deflated 82%) 2025-12-04T12:52:20.8692405Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_8a0f3583325b9bd1_.log (deflated 82%) 2025-12-04T12:52:20.8699510Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_b9cbe2f6b7b91336_.log (deflated 94%) 2025-12-04T12:52:20.8706427Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_5b3b256f80a196fa_.log (deflated 94%) 2025-12-04T12:52:20.8715016Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_46e43edcce57a01b_.log (deflated 94%) 2025-12-04T12:52:20.8723772Z adding: test/test-reports/distributed.test_distributed_spawn_9.9_b2ce0dce90fd794f_.log (deflated 94%) 2025-12-04T12:52:20.8724542Z adding: test/test-reports/distributed.test_composability_1.1_47908043dcf692eb_.log (deflated 76%) 2025-12-04T12:52:20.8726345Z adding: test/test-reports/distributed.test_multi_threaded_pg_1.1_0640315784817164_.log (deflated 87%) 2025-12-04T12:52:20.8729831Z adding: test/test-reports/distributed.elastic.utils.distributed_test_1.1_2f07a4b12f9c1ea3_.log (deflated 93%) 2025-12-04T12:52:20.8730734Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_extensions_1.1_670c155f675be16b_.log (deflated 70%) 2025-12-04T12:52:20.8736419Z adding: test/test-reports/distributed.fsdp.test_wrap_1.1_9a143bedfca4724f_.log (deflated 93%) 2025-12-04T12:52:20.8739877Z adding: test/test-reports/distributed.fsdp.test_fsdp_hybrid_shard_1.1_393c4bfd443690e9_.log (deflated 92%) 2025-12-04T12:52:20.8740557Z adding: test/test-reports/distributed.elastic.utils.logging_test_1.1_aa6947c2b0a1b352_.log (deflated 58%) 2025-12-04T12:52:20.9562370Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_training_1.1_6ed6432c508bcf99_.log (deflated 95%) 2025-12-04T12:52:20.9582697Z adding: test/test-reports/distributed.rpc.cuda.test_tensorpipe_agent_1.2_384a5ff4692986ce_.log (deflated 96%) 2025-12-04T12:52:20.9597125Z adding: test/test-reports/distributed.optim.test_zero_redundancy_optimizer_1.1_1138196092c61589_.log (deflated 95%) 2025-12-04T12:52:20.9597699Z adding: test/test-reports/distributed.rpc.test_tensorpipe_agent_1.1_e870d0f3c8f660e1_.log (stored 0%) 2025-12-04T12:52:20.9632545Z adding: test/test-reports/distributed.test_c10d_gloo_2.2_9367ba993beea467_.log (deflated 96%) 2025-12-04T12:52:20.9633002Z adding: test/test-reports/distributed.test_launcher_1.1_d44269eab8f8d94e_.log (deflated 58%) 2025-12-04T12:52:20.9658408Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-12-04T12:52:20.9658635Z # Remove any previous debugging artifacts if they exist 2025-12-04T12:52:20.9658746Z rm -f debug-*.zip 2025-12-04T12:52:20.9658876Z if [ -d 'test/debug' ]; then 2025-12-04T12:52:20.9659039Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-12-04T12:52:20.9659135Z fi 2025-12-04T12:52:20.9664770Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:20.9664871Z env: 2025-12-04T12:52:20.9664981Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:20.9665079Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:20.9665267Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:20.9665706Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:20.9666009Z FILE_SUFFIX: test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892 2025-12-04T12:52:20.9666180Z ##[endgroup] 2025-12-04T12:52:20.9745081Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:52:20.9745175Z with: 2025-12-04T12:52:20.9745290Z s3-bucket: gha-artifacts 2025-12-04T12:52:20.9745467Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:52:20.9745681Z retention-days: 14 2025-12-04T12:52:20.9745794Z if-no-files-found: warn 2025-12-04T12:52:20.9745898Z path: test-jsons-*.zip 2025-12-04T12:52:20.9745986Z name: artifact 2025-12-04T12:52:20.9746198Z region: us-east-1 2025-12-04T12:52:20.9746281Z env: 2025-12-04T12:52:20.9746386Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:20.9746492Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:20.9746661Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:20.9746972Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:20.9747067Z ##[endgroup] 2025-12-04T12:52:21.3618668Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T12:52:21.3619244Z With the provided path, there will be 1 file uploaded 2025-12-04T12:52:21.3619755Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:52:21.3662377Z Starting upload of test-jsons-test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892.zip 2025-12-04T12:52:21.5478123Z Finished upload of test-jsons-test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892.zip 2025-12-04T12:52:21.5649039Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:52:21.5649397Z with: 2025-12-04T12:52:21.5649642Z s3-bucket: gha-artifacts 2025-12-04T12:52:21.5650009Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:52:21.5650379Z retention-days: 14 2025-12-04T12:52:21.5650662Z if-no-files-found: error 2025-12-04T12:52:21.5650967Z path: test-reports-*.zip 2025-12-04T12:52:21.5651258Z name: artifact 2025-12-04T12:52:21.5651492Z region: us-east-1 2025-12-04T12:52:21.5651739Z env: 2025-12-04T12:52:21.5651974Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:21.5652249Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:21.5652593Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:21.5653205Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:21.5654033Z ##[endgroup] 2025-12-04T12:52:21.9086518Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T12:52:21.9087079Z With the provided path, there will be 1 file uploaded 2025-12-04T12:52:21.9087608Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:52:21.9130002Z Starting upload of test-reports-test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892.zip 2025-12-04T12:52:22.1069734Z Finished upload of test-reports-test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892.zip 2025-12-04T12:52:22.1250312Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:52:22.1250668Z with: 2025-12-04T12:52:22.1250914Z s3-bucket: gha-artifacts 2025-12-04T12:52:22.1251282Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:52:22.1251653Z retention-days: 14 2025-12-04T12:52:22.1251930Z if-no-files-found: ignore 2025-12-04T12:52:22.1252227Z path: logs-*.zip 2025-12-04T12:52:22.1252465Z name: artifact 2025-12-04T12:52:22.1252724Z region: us-east-1 2025-12-04T12:52:22.1252974Z env: 2025-12-04T12:52:22.1253201Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:22.1253588Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:22.1254120Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:22.1254816Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:22.1255387Z ##[endgroup] 2025-12-04T12:52:22.4701045Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T12:52:22.4701597Z With the provided path, there will be 1 file uploaded 2025-12-04T12:52:22.4702110Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:52:22.4745398Z Starting upload of logs-test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892.zip 2025-12-04T12:52:22.6875416Z Finished upload of logs-test-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084892.zip 2025-12-04T12:52:22.7048092Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:52:22.7048432Z with: 2025-12-04T12:52:22.7048684Z s3-bucket: gha-artifacts 2025-12-04T12:52:22.7049021Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:52:22.7049372Z retention-days: 14 2025-12-04T12:52:22.7049636Z if-no-files-found: ignore 2025-12-04T12:52:22.7049916Z path: debug-*.zip 2025-12-04T12:52:22.7050143Z name: artifact 2025-12-04T12:52:22.7050388Z region: us-east-1 2025-12-04T12:52:22.7050620Z env: 2025-12-04T12:52:22.7050837Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:22.7051103Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:22.7051431Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:22.7052013Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:22.7052518Z ##[endgroup] 2025-12-04T12:52:23.0454407Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-12-04T12:52:23.0623198Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T12:52:23.0623620Z # shellcheck disable=SC2156 2025-12-04T12:52:23.0624297Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T12:52:23.0631109Z shell: /usr/bin/bash -e {0} 2025-12-04T12:52:23.0631399Z env: 2025-12-04T12:52:23.0631642Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:23.0631953Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:23.0632306Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:23.0632925Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:23.0633590Z ##[endgroup] 2025-12-04T12:52:23.3800898Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a 2025-12-04T12:52:23.3801455Z with: 2025-12-04T12:52:23.3801843Z name: coredumps-distributed-1-3-lf.linux.g4dn.12xlarge.nvidia.gpu 2025-12-04T12:52:23.3802311Z retention-days: 14 2025-12-04T12:52:23.3802592Z if-no-files-found: ignore 2025-12-04T12:52:23.3802892Z path: ./**/core.[1-9]* 2025-12-04T12:52:23.3803169Z s3-bucket: gha-artifacts 2025-12-04T12:52:23.3803462Z region: us-east-1 2025-12-04T12:52:23.3803705Z env: 2025-12-04T12:52:23.3803928Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:23.3804200Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:23.3804540Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:23.3805239Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:23.3805767Z ##[endgroup] 2025-12-04T12:52:30.9631409Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T12:52:30.9894285Z Prepare all required actions 2025-12-04T12:52:30.9894751Z Getting action download info 2025-12-04T12:52:31.2200167Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548) 2025-12-04T12:52:31.6097583Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-12-04T12:52:31.6097998Z with: 2025-12-04T12:52:31.6098245Z job_id: 57116084892 2025-12-04T12:52:31.6098907Z job_name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T12:52:31.6099640Z workflow_name: trunk 2025-12-04T12:52:31.6099929Z workflow_run_id: 19922768520 2025-12-04T12:52:31.6100251Z workflow_attempt: 1 2025-12-04T12:52:31.6100538Z env: 2025-12-04T12:52:31.6100766Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:31.6101071Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:31.6101436Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:31.6102127Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:31.6102696Z ##[endgroup] 2025-12-04T12:52:31.6143615Z ##[group]Run actions/setup-python@v6 2025-12-04T12:52:31.6143974Z with: 2025-12-04T12:52:31.6144217Z python-version: 3.10 2025-12-04T12:52:31.6144520Z check-latest: false 2025-12-04T12:52:31.6144929Z token: *** 2025-12-04T12:52:31.6145205Z update-environment: true 2025-12-04T12:52:31.6145520Z allow-prereleases: false 2025-12-04T12:52:31.6145943Z freethreaded: false 2025-12-04T12:52:31.6146311Z env: 2025-12-04T12:52:31.6146517Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:31.6146792Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:31.6147118Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:31.6147694Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:31.6148209Z ##[endgroup] 2025-12-04T12:52:31.7698706Z ##[group]Installed versions 2025-12-04T12:52:31.7708146Z Version 3.10 was not found in the local cache 2025-12-04T12:52:31.7908912Z (node:457580) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2025-12-04T12:52:31.7909848Z (Use `node --trace-deprecation ...` to show where the warning was created) 2025-12-04T12:52:32.1534138Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system. The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json 2025-12-04T12:52:32.1689505Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-12-04T12:52:32.1690006Z with: 2025-12-04T12:52:32.1690241Z env: 2025-12-04T12:52:32.1690472Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:32.1690793Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:32.1691269Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:32.1691966Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:32.1692482Z ##[endgroup] 2025-12-04T12:52:32.1733551Z ##[group]Run set -eou pipefail 2025-12-04T12:52:32.1734178Z set -eou pipefail 2025-12-04T12:52:32.1734509Z  2025-12-04T12:52:32.1734940Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-12-04T12:52:32.1735534Z for _ in $(seq 1440); do 2025-12-04T12:52:32.1736078Z  # Break if no ssh session exists anymore 2025-12-04T12:52:32.1742882Z  if [ "$(who)" = "" ]; then 2025-12-04T12:52:32.1743323Z  break 2025-12-04T12:52:32.1743585Z  fi 2025-12-04T12:52:32.1743847Z  echo "." 2025-12-04T12:52:32.1744136Z  sleep 5 2025-12-04T12:52:32.1744396Z done 2025-12-04T12:52:32.1750934Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:32.1751346Z env: 2025-12-04T12:52:32.1772190Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:32.1772485Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:32.1772790Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:32.1773412Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:32.1774120Z ##[endgroup] 2025-12-04T12:52:32.1800235Z Holding runner for 2 hours until all ssh sessions have logged out 2025-12-04T12:52:32.1881842Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T12:52:32.1882489Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T12:52:32.1883130Z # shellcheck disable=SC2046 2025-12-04T12:52:32.1883507Z docker stop $(docker ps -q) || true 2025-12-04T12:52:32.1883906Z # Prune all of the docker images 2025-12-04T12:52:32.1884286Z docker system prune -af 2025-12-04T12:52:32.1890107Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:32.1890543Z env: 2025-12-04T12:52:32.1890788Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:32.1891208Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:32.1891648Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:32.1892231Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:32.1892845Z ##[endgroup] 2025-12-04T12:52:42.9788419Z f2da02c9e7d7 2025-12-04T12:52:43.5829885Z Deleted Containers: 2025-12-04T12:52:43.5830373Z f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:43.5830762Z 2025-12-04T12:52:51.0166295Z Deleted Images: 2025-12-04T12:52:51.0166724Z untagged: public.ecr.aws/docker/library/python:3.13 2025-12-04T12:52:51.0167557Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T12:52:51.0168515Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a 2025-12-04T12:52:51.0169273Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd 2025-12-04T12:52:51.0170008Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67 2025-12-04T12:52:51.0170730Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c 2025-12-04T12:52:51.0171472Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212 2025-12-04T12:52:51.0172197Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318 2025-12-04T12:52:51.0172893Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560 2025-12-04T12:52:51.0174229Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee 2025-12-04T12:52:51.0175443Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T12:52:51.0176961Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T12:52:51.0178023Z deleted: sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301 2025-12-04T12:52:51.0178998Z deleted: sha256:85a76b7bf29ad34eb76cce6f46af5d49a58b6272f80f983d5c769e82c7749301 2025-12-04T12:52:51.0179772Z deleted: sha256:0882f3ce59ff5ae30195ee4b059fc713e13eda107a3a7814a4616ac9058a30a4 2025-12-04T12:52:51.0180523Z deleted: sha256:64ba5b9344c11a3e4729136076830b90ac4cf1554046edb1bd4f0784b66ebd9b 2025-12-04T12:52:51.0181255Z deleted: sha256:88213c59cf461a65ab9b6cb07b4195dc9d41b5241c152daa002c7b3112e09124 2025-12-04T12:52:51.0182022Z deleted: sha256:4c0f83afa802ffbc05ebaf1aa50e48a2447c7c295549a6dded80ac63437906ca 2025-12-04T12:52:51.0182783Z deleted: sha256:6f7ec74460e8fb070c8209949095ea3be5f4e2fd69c9f750cd39ac4093f5e64b 2025-12-04T12:52:51.0183534Z deleted: sha256:d6928b0d1021b31942fdcb64e5eb4a34682de66e959dd424ed6ed02c29cd706d 2025-12-04T12:52:51.0184360Z deleted: sha256:4e9fbcb1705a6351bb34dd320558752614308636b94fd9ae6f26063e3deadc0a 2025-12-04T12:52:51.0185100Z deleted: sha256:43aabd0201f48712f21758071352dea029b4de37be08b2e2197706856a9ecbf2 2025-12-04T12:52:51.0185837Z deleted: sha256:940a98dec78303f0548beb1033242a45e9097607ef3e55c8b949b69b73d1b95e 2025-12-04T12:52:51.0186570Z deleted: sha256:d2849fa0e0411cf66e4408831d70e38838afb55b11a80c1c4d8aa0ae7dc9ca40 2025-12-04T12:52:51.0187308Z deleted: sha256:14f40d23c20c7e562623f89deb376520296758bc39dd3c77284049b84ebd8a31 2025-12-04T12:52:51.0188061Z deleted: sha256:a8ccba61f90ca097cb391d0f4fbed0d9f821d06b00e28f7332e9e2dcfcbac4ca 2025-12-04T12:52:51.0188815Z deleted: sha256:91b2060d290547d3b517d4a11d994bbe23f4560b5546cb91918ca1828dde6be1 2025-12-04T12:52:51.0189537Z deleted: sha256:b42a184755715dcfead7fad655a127433541d316d9628f5f730ff17ad5f8071c 2025-12-04T12:52:51.0190289Z deleted: sha256:aa5b4f3c9169061dc3c6da0e677e8a86f11ecb0a3f9fb4861ab3d8c04379775c 2025-12-04T12:52:51.0191152Z deleted: sha256:b4dcf450081a48d77fea0a21b8d810a69c03608a595e754fe7d365058d0579b7 2025-12-04T12:52:51.0191872Z deleted: sha256:4f7fe12d3d4f5bf890c7ada4ce16f17a105472aa6509a778f917dcce2f28174b 2025-12-04T12:52:51.0192606Z deleted: sha256:2d1d5a74182594f9a8553df00fdcfc809dba407bcd6700d667f862cbe9d555ce 2025-12-04T12:52:51.0193347Z deleted: sha256:d901e2f5d449aeed16b727bdcc11fc0e0f6c30c8fc5c39ac7eeac8a74d9d176c 2025-12-04T12:52:51.0194146Z deleted: sha256:a04df2603bd12372c6632469a9a81ebc4a8d677452c250672b9692884fa6a452 2025-12-04T12:52:51.0194849Z deleted: sha256:f438a6b52273a552dc3820a55c74c53a62a0eae9f2a7d21b37125add7d71639f 2025-12-04T12:52:51.0195566Z deleted: sha256:d4b09517e9518d709ac98b0ae6f8446ec9ac51688253607b1fca67aa2c87b3f4 2025-12-04T12:52:51.0196289Z deleted: sha256:c1fa38335237f5e7263e39d3d3de98215bcfbbb12b826955c02e149bf68efd13 2025-12-04T12:52:51.0196998Z deleted: sha256:c898d20a30de901fca74d7611663b17ab48e1726a11e031e40548ed16ee81877 2025-12-04T12:52:51.0197722Z deleted: sha256:3baceec7096518fcc10696feba551639d698b3145c2fc09cac927bb60c0fd751 2025-12-04T12:52:51.0198469Z deleted: sha256:5245aaaa3d5c3a19f76b9a6c920bd82d1a0ff5289f87c8c109652089709d9b3b 2025-12-04T12:52:51.0199177Z deleted: sha256:f05cc789b95246938c377f474c41187965b89ceac0250e7d5124bec32153f447 2025-12-04T12:52:51.0199950Z deleted: sha256:07ec4fc008de4e7a2c794ec7094cc72e0d287c04c8b2156163aee0bae147fe2d 2025-12-04T12:52:51.0200684Z deleted: sha256:c6302601ad5fde573c1f8c900250478fca7fdc6907d8fd4fae651b94b4d9264d 2025-12-04T12:52:51.0201415Z deleted: sha256:cc5e955ee1dc54931f02606c5ea87aae14f03b5d764492be611480ab041f2882 2025-12-04T12:52:51.0202280Z deleted: sha256:f21c03518996d98452338f4e80bcfd9b139a1dab155f4830be0d3f623035269f 2025-12-04T12:52:51.0203104Z deleted: sha256:519ca6f1279f7886f25f0005527cfa627deebbc5b7d7cdbfa7ef962bcfc4c26d 2025-12-04T12:52:51.0203807Z deleted: sha256:0ef990495216807d0175b192045be3f617e72331bc373b3434807f41bf69168d 2025-12-04T12:52:51.0204677Z deleted: sha256:7093edf7319e1f0e01654c3224e32c8dede5b948d106e0b9b03cbf0bb1091e33 2025-12-04T12:52:51.0205431Z deleted: sha256:c478161e058e2f4041555c3e880b95ee1ee047938dc58549a3a88135740996ae 2025-12-04T12:52:51.0206161Z deleted: sha256:9bb853b0d938cd7c36a80ce8ee40653f2c0ff92719209b11beb03acc8855ce3e 2025-12-04T12:52:51.0206987Z deleted: sha256:fdf2ace71a78ce6910ef9c4b073c195531da47022443b606bb92dcd6499b6afc 2025-12-04T12:52:51.0207725Z deleted: sha256:576c2b3770d871937d3cfb7014328bcb4bd1aed0c28bc438764b3bfdac4c1ac2 2025-12-04T12:52:51.0208449Z deleted: sha256:878e92b9cb82de09ac14a9d5f3f7bc2411a799b6f54d0d64b78c2bb4d1fdc0fc 2025-12-04T12:52:51.0209178Z deleted: sha256:85c8c3b98b65a6695f988a10cc66c981d73a3ef03eda15b8e14d227b50b56300 2025-12-04T12:52:51.0209924Z deleted: sha256:ce2ab3ba07794f9ee95d6ea7de6dcd3d2aed96561f9a79192dd56ca5bf29313a 2025-12-04T12:52:51.0210642Z deleted: sha256:37a6e12976ca957286977e696e63012ab9821214b0483fe1a48d29dcb280508a 2025-12-04T12:52:51.0211363Z deleted: sha256:cd1d5d3dd7038144ca6fe961c0d4c8e705625ae0c36190ba8b3e9602abedad19 2025-12-04T12:52:51.0212130Z deleted: sha256:0e707276e0be2e0008b86d594fadc0d16444d66c4fb7227c56f144cbb3c2affd 2025-12-04T12:52:51.0212860Z deleted: sha256:22d4aad6a2ada91b341c1225a0f314042b8aeabef7568c5c019709b058bf070b 2025-12-04T12:52:51.0213650Z deleted: sha256:ee4adacf4e0933131d0275eddad406b3c8147e6cf07a292b99f1aff4b5355f33 2025-12-04T12:52:51.0214580Z deleted: sha256:43da0b9e7c0e18403dcb834e53628dc7c970ccb2dbd091878c0d7c0170dbc97f 2025-12-04T12:52:51.0215338Z deleted: sha256:00571684bdcd75beda15eb7d4e79b5458bc914350f9bb4d87fcdc97ad15e0da1 2025-12-04T12:52:51.0216073Z deleted: sha256:41615f09950259f1d75e82ef35b6fc53b18fe71ebff143744cfd51009d04349e 2025-12-04T12:52:51.0216824Z deleted: sha256:75ab34d2eed3c7915467a506ab6dab2711918fbabe94add2fb5c62780221ab0c 2025-12-04T12:52:51.0217589Z deleted: sha256:0a39ef2bebf44c1c3893d1e5fb42dad48b8fac7ca673141267ee967f85455e89 2025-12-04T12:52:51.0218341Z deleted: sha256:9b7d024e48ba1f9824a54597621b1b062cbc4aa41a77d81ca538d6b5c24a612c 2025-12-04T12:52:51.0219069Z deleted: sha256:392257172de6434c271bd93394218a91e9aa86d7c18abc2f2759317b9d5fb6de 2025-12-04T12:52:51.0219796Z deleted: sha256:6c3232860b930866a463a356124fc392c7e5f04895695229257e8c3e8a02711d 2025-12-04T12:52:51.0220528Z deleted: sha256:63dd55b807215e2fa6c715419ac0c5072d02dddc848dbf74bb7e77b906b5eaed 2025-12-04T12:52:51.0221270Z deleted: sha256:07a8738c1b4584db72ed9aa60f5274321eb0ba16263450da3a75df8326ebc25f 2025-12-04T12:52:51.0222034Z deleted: sha256:053fe2965b01281d12040ec1893e0d1aa77362a49ea9a1067402272c69dad9f5 2025-12-04T12:52:51.0222776Z deleted: sha256:7857fb5eb181c4e80262ecab60bdd3c266cf3d1409ceb76c05882609b416a8d3 2025-12-04T12:52:51.0223522Z deleted: sha256:752528477fc99089de3bd2c6da7b30cf34f2e901fe06d8fcfe685b411461e883 2025-12-04T12:52:51.0224261Z deleted: sha256:cce0210e2f4b042601813df03aa294a86b0c668fcfc75f4c63f6fa12b2952e15 2025-12-04T12:52:51.0225125Z deleted: sha256:f2bb405a26705ecd12d21380d26d9355d01db3a2175080fbdb468f2b5a25a76c 2025-12-04T12:52:51.0225869Z deleted: sha256:ad430120d4ffbaf97cd8d6de6ea8eefa4a8f80ec45f0b176c6b26bff0970fd33 2025-12-04T12:52:51.0226669Z deleted: sha256:225a4910baea7cc540ed43eeac75046293800ab0b8e0192b51e991c8cb50bcf3 2025-12-04T12:52:51.0227329Z deleted: sha256:a259945b0c3507f049fbac10fb3d3ffe43d45e83c91b80ae8cd1dafb855ad83c 2025-12-04T12:52:51.0227991Z deleted: sha256:862a98881b1d5adad5c21d01602773b894794097de80964ef8f47bcaadb43255 2025-12-04T12:52:51.0228646Z deleted: sha256:1cf6d3c8b6c2694b79a2d08719594903811c330a36a4c7a8a7153a350b53d292 2025-12-04T12:52:51.0229296Z deleted: sha256:232a1ae8b0fee817ff7838bb5986a2f38377d3b1dbbf5217b576af0f953b0844 2025-12-04T12:52:51.0229963Z deleted: sha256:c72c5705dabd6314423dd7d4fb260a20d5d9886b2ebce60d19e9d78c4a2335c2 2025-12-04T12:52:51.0230685Z deleted: sha256:296734cf81fd92c913884d058908598424ffe072676e38de289bbab83768c7bd 2025-12-04T12:52:51.0231334Z deleted: sha256:7c76040481b889847a1804021aeff07547eaa4ee706d6137db218d497a8fd9c1 2025-12-04T12:52:51.0231989Z deleted: sha256:d5e293f5b354e8cbcc6de893ea72cc632b02d8fdfbb08ec3127c4e9662f3ebff 2025-12-04T12:52:51.0232667Z deleted: sha256:f35a64e429c88e249645090f21fbe7dae108d98e0ab4ea13184f24b3fd66c315 2025-12-04T12:52:51.0233333Z deleted: sha256:ce6ae8d595c8e69115c51b1ce4f9a9158795d7b863b1cb53f21c39a87974d41b 2025-12-04T12:52:51.0233992Z deleted: sha256:8941abaee59400fb9b3a60765fea4a1fc2a6a447467a6d983e84c7f72494a450 2025-12-04T12:52:51.0234677Z deleted: sha256:ef53c29a9a2c2bc80ffdb9bfaf92842436b5755ec1ce828b9d11e5e27d656ea1 2025-12-04T12:52:51.0235360Z deleted: sha256:7a347fb0acb43f1c814f8c8ff21185e8b5cf64d7bc5988cea060f77d906e08b5 2025-12-04T12:52:51.0236031Z deleted: sha256:cc855dc9be79496e15175569dced2d13477e50b077a5fd3945f9bf50018880c1 2025-12-04T12:52:51.0236686Z deleted: sha256:f7a9946ada3d4786658bc0b643808bb32a9a45e4e90e30dc43ee19e2dbe24024 2025-12-04T12:52:51.0237347Z deleted: sha256:c22a9215f62812c1d2e32827f5221ff556c5b6702aadbdab6b87b8293f19635e 2025-12-04T12:52:51.0238008Z deleted: sha256:959a56746620012e37c1def1a83c5afb1e7c0adc59b021a28beb53c24df98032 2025-12-04T12:52:51.0238709Z deleted: sha256:31a0fff0695bf6100c17954be72eab2095b466d559c75c3faf2a17d8c41e6ebe 2025-12-04T12:52:51.0239356Z deleted: sha256:c15e2b5241b9e55af1b2593e544391b4b44d0505e6528e8f12425136e93b424c 2025-12-04T12:52:51.0240011Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b 2025-12-04T12:52:51.0240398Z 2025-12-04T12:52:51.0240525Z Total reclaimed space: 35.57GB 2025-12-04T12:52:51.0276970Z ##[group]Run set +e 2025-12-04T12:52:51.0277329Z set +e 2025-12-04T12:52:51.0277588Z set -x 2025-12-04T12:52:51.0277822Z  2025-12-04T12:52:51.0278061Z nvidia-smi 2025-12-04T12:52:51.0278577Z # NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in 2025-12-04T12:52:51.0279765Z # the case where the driver has already crashed as it still can get the driver version 2025-12-04T12:52:51.0280540Z # and some basic information like the bus ID. However, the rest of the information 2025-12-04T12:52:51.0281163Z # would be missing (ERR!), for example: 2025-12-04T12:52:51.0281541Z # 2025-12-04T12:52:51.0281898Z # +-----------------------------------------------------------------------------+ 2025-12-04T12:52:51.0282505Z # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | 2025-12-04T12:52:51.0283267Z # |-------------------------------+----------------------+----------------------+ 2025-12-04T12:52:51.0283891Z # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T12:52:51.0284563Z # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T12:52:51.0285126Z # | | | MIG M. | 2025-12-04T12:52:51.0285559Z # |===============================+======================+======================| 2025-12-04T12:52:51.0286045Z # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | 2025-12-04T12:52:51.0286598Z # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | 2025-12-04T12:52:51.0287109Z # | | | ERR! | 2025-12-04T12:52:51.0287609Z # +-------------------------------+----------------------+----------------------+ 2025-12-04T12:52:51.0288046Z # 2025-12-04T12:52:51.0288394Z # +-----------------------------------------------------------------------------+ 2025-12-04T12:52:51.0288929Z # | Processes: | 2025-12-04T12:52:51.0289487Z # | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T12:52:51.0289997Z # | ID ID Usage | 2025-12-04T12:52:51.0290434Z # |=============================================================================| 2025-12-04T12:52:51.0291026Z # +-----------------------------------------------------------------------------+ 2025-12-04T12:52:51.0291544Z # 2025-12-04T12:52:51.0291930Z # This should be reported as a failure instead as it will guarantee to fail when 2025-12-04T12:52:51.0292459Z # Docker tries to run with --gpus all 2025-12-04T12:52:51.0292789Z # 2025-12-04T12:52:51.0293173Z # So, the correct check here is to query one of the missing piece of info like 2025-12-04T12:52:51.0293956Z # GPU name, so that the command can fail accordingly 2025-12-04T12:52:51.0294530Z nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T12:52:51.0295033Z NVIDIA_SMI_STATUS=$? 2025-12-04T12:52:51.0295330Z  2025-12-04T12:52:51.0295841Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T12:52:51.0296678Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T12:52:51.0297363Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T12:52:51.0297947Z  .github/scripts/stop_runner_service.sh 2025-12-04T12:52:51.0298333Z fi 2025-12-04T12:52:51.0298572Z  2025-12-04T12:52:51.0299140Z # For runner with multiple GPUs, we also want to confirm that the number of GPUs are the 2025-12-04T12:52:51.0299881Z # power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails 2025-12-04T12:52:51.0300497Z # https://github.com/pytorch/test-infra/issues/4000 2025-12-04T12:52:51.0301010Z GPU_COUNT=$(nvidia-smi --list-gpus | wc -l) 2025-12-04T12:52:51.0301413Z NVIDIA_SMI_STATUS=$? 2025-12-04T12:52:51.0301721Z  2025-12-04T12:52:51.0302228Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T12:52:51.0302992Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T12:52:51.0303662Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T12:52:51.0304255Z  .github/scripts/stop_runner_service.sh 2025-12-04T12:52:51.0304639Z fi 2025-12-04T12:52:51.0304900Z  2025-12-04T12:52:51.0305188Z # Check the GPU count to be a power of 2 2025-12-04T12:52:51.0305847Z if [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then 2025-12-04T12:52:51.0306763Z  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..." 2025-12-04T12:52:51.0307351Z  .github/scripts/stop_runner_service.sh 2025-12-04T12:52:51.0307696Z fi 2025-12-04T12:52:51.0314800Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:51.0315193Z env: 2025-12-04T12:52:51.0315421Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:52:51.0315695Z HAS_NVIDIA_GPU: true 2025-12-04T12:52:51.0316007Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:52:51.0316590Z DOCKER_CONTAINER_ID: f2da02c9e7d7602d7bdf1034139039d2ae520ce1b20d3258c8e69096bee36221 2025-12-04T12:52:51.0317111Z ##[endgroup] 2025-12-04T12:52:51.0345377Z + nvidia-smi 2025-12-04T12:52:51.0801773Z Thu Dec 4 12:52:51 2025 2025-12-04T12:52:51.0802458Z +-----------------------------------------------------------------------------------------+ 2025-12-04T12:52:51.0803104Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T12:52:51.0803711Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:52:51.0804324Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T12:52:51.0804964Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T12:52:51.0805509Z | | | MIG M. | 2025-12-04T12:52:51.0805910Z |=========================================+========================+======================| 2025-12-04T12:52:51.1438372Z | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | 2025-12-04T12:52:51.1439493Z | N/A 24C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:52:51.1440005Z | | | N/A | 2025-12-04T12:52:51.1440492Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:52:51.1441032Z | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | 2025-12-04T12:52:51.1441543Z | N/A 24C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:52:51.1442208Z | | | N/A | 2025-12-04T12:52:51.1442673Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:52:51.1443199Z | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | 2025-12-04T12:52:51.1443840Z | N/A 25C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:52:51.1444292Z | | | N/A | 2025-12-04T12:52:51.1444774Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:52:51.1445295Z | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T12:52:51.1445799Z | N/A 24C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:52:51.1446243Z | | | N/A | 2025-12-04T12:52:51.1446717Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:52:51.1447082Z 2025-12-04T12:52:51.1447287Z +-----------------------------------------------------------------------------------------+ 2025-12-04T12:52:51.1447880Z | Processes: | 2025-12-04T12:52:51.1448405Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T12:52:51.1448905Z | ID ID Usage | 2025-12-04T12:52:51.1449319Z |=========================================================================================| 2025-12-04T12:52:51.1469226Z | No running processes found | 2025-12-04T12:52:51.1469991Z +-----------------------------------------------------------------------------------------+ 2025-12-04T12:52:51.7834047Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T12:52:51.8017503Z Tesla T4 2025-12-04T12:52:51.8286998Z + NVIDIA_SMI_STATUS=0 2025-12-04T12:52:51.8287355Z + '[' 0 -ne 0 ']' 2025-12-04T12:52:51.8293701Z ++ nvidia-smi --list-gpus 2025-12-04T12:52:51.8294561Z ++ wc -l 2025-12-04T12:52:51.8748060Z + GPU_COUNT=4 2025-12-04T12:52:51.8748447Z + NVIDIA_SMI_STATUS=0 2025-12-04T12:52:51.8748798Z + '[' 0 -ne 0 ']' 2025-12-04T12:52:51.8749498Z + '[' 4 -le 8 ']' 2025-12-04T12:52:51.8749840Z + '[' 4 -ne 1 ']' 2025-12-04T12:52:51.8750106Z + '[' 4 -ne 2 ']' 2025-12-04T12:52:51.8750380Z + '[' 4 -ne 4 ']' 2025-12-04T12:52:51.8821081Z Post job cleanup. 2025-12-04T12:52:51.8902388Z Post job cleanup. 2025-12-04T12:52:51.8949115Z Post job cleanup. 2025-12-04T12:52:51.9961341Z [command]/usr/bin/git version 2025-12-04T12:52:52.0005279Z git version 2.50.1 2025-12-04T12:52:52.0043023Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/abc89a0c-2ffe-49b9-ab72-fe06bb1488c4/.gitconfig' 2025-12-04T12:52:52.0053176Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/abc89a0c-2ffe-49b9-ab72-fe06bb1488c4' before making global git config changes 2025-12-04T12:52:52.0054609Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T12:52:52.0058859Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T12:52:52.0097480Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T12:52:52.0136464Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T12:52:52.0465388Z Entering 'android/libs/fbjni' 2025-12-04T12:52:52.0525072Z Entering 'third_party/FP16' 2025-12-04T12:52:52.0583074Z Entering 'third_party/FXdiv' 2025-12-04T12:52:52.0641676Z Entering 'third_party/NNPACK' 2025-12-04T12:52:52.0699874Z Entering 'third_party/NVTX' 2025-12-04T12:52:52.0761715Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T12:52:52.0821185Z Entering 'third_party/XNNPACK' 2025-12-04T12:52:52.0901035Z Entering 'third_party/aiter' 2025-12-04T12:52:52.0960500Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T12:52:52.1036689Z Entering 'third_party/benchmark' 2025-12-04T12:52:52.1096607Z Entering 'third_party/composable_kernel' 2025-12-04T12:52:52.1164777Z Entering 'third_party/cpp-httplib' 2025-12-04T12:52:52.1223907Z Entering 'third_party/cpuinfo' 2025-12-04T12:52:52.1282230Z Entering 'third_party/cudnn_frontend' 2025-12-04T12:52:52.1341586Z Entering 'third_party/cutlass' 2025-12-04T12:52:52.1413718Z Entering 'third_party/fbgemm' 2025-12-04T12:52:52.1476887Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T12:52:52.1534723Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T12:52:52.1602581Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T12:52:52.1658557Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T12:52:52.1725826Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T12:52:52.1783866Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T12:52:52.1840499Z Entering 'third_party/fbgemm/external/json' 2025-12-04T12:52:52.1900036Z Entering 'third_party/flash-attention' 2025-12-04T12:52:52.1961805Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T12:52:52.2025317Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T12:52:52.2094713Z Entering 'third_party/flatbuffers' 2025-12-04T12:52:52.2156704Z Entering 'third_party/fmt' 2025-12-04T12:52:52.2215591Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T12:52:52.2274392Z Entering 'third_party/gloo' 2025-12-04T12:52:52.2336046Z Entering 'third_party/googletest' 2025-12-04T12:52:52.2394916Z Entering 'third_party/ideep' 2025-12-04T12:52:52.2452622Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T12:52:52.2520780Z Entering 'third_party/ittapi' 2025-12-04T12:52:52.2576424Z Entering 'third_party/kineto' 2025-12-04T12:52:52.2635761Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T12:52:52.2695419Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T12:52:52.2755568Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T12:52:52.2815973Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T12:52:52.2873767Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T12:52:52.2931137Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T12:52:52.3001226Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T12:52:52.3058172Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T12:52:52.3116968Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T12:52:52.3175617Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T12:52:52.3234933Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T12:52:52.3293717Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:52:52.3355418Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:52:52.3416105Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T12:52:52.3473538Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T12:52:52.3535350Z Entering 'third_party/kleidiai' 2025-12-04T12:52:52.3595894Z Entering 'third_party/mimalloc' 2025-12-04T12:52:52.3654183Z Entering 'third_party/nlohmann' 2025-12-04T12:52:52.3715817Z Entering 'third_party/onnx' 2025-12-04T12:52:52.3794980Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T12:52:52.3854469Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T12:52:52.3914993Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T12:52:52.3970774Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T12:52:52.4033239Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T12:52:52.4089601Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T12:52:52.4154471Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T12:52:52.4214463Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T12:52:52.4272487Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T12:52:52.4333209Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:52:52.4393396Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:52:52.4455461Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T12:52:52.4534768Z Entering 'third_party/pocketfft' 2025-12-04T12:52:52.4594000Z Entering 'third_party/protobuf' 2025-12-04T12:52:52.4655285Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T12:52:52.4713277Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T12:52:52.4771573Z Entering 'third_party/psimd' 2025-12-04T12:52:52.4831979Z Entering 'third_party/pthreadpool' 2025-12-04T12:52:52.4890825Z Entering 'third_party/pybind11' 2025-12-04T12:52:52.4952608Z Entering 'third_party/python-peachpy' 2025-12-04T12:52:52.5009781Z Entering 'third_party/sleef' 2025-12-04T12:52:52.5069406Z Entering 'third_party/tensorpipe' 2025-12-04T12:52:52.5128905Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T12:52:52.5183619Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T12:52:52.5240812Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T12:52:52.5296764Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T12:52:52.5353336Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T12:52:52.5431195Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T12:52:52.5454997Z http.https://github.com/.extraheader 2025-12-04T12:52:52.5463720Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T12:52:52.5495199Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T12:52:52.5815682Z Entering 'android/libs/fbjni' 2025-12-04T12:52:52.5855072Z http.https://github.com/.extraheader 2025-12-04T12:52:52.5889259Z Entering 'third_party/FP16' 2025-12-04T12:52:52.5929713Z http.https://github.com/.extraheader 2025-12-04T12:52:52.5967625Z Entering 'third_party/FXdiv' 2025-12-04T12:52:52.6007659Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6043626Z Entering 'third_party/NNPACK' 2025-12-04T12:52:52.6085328Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6123233Z Entering 'third_party/NVTX' 2025-12-04T12:52:52.6162954Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6200099Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T12:52:52.6240431Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6276317Z Entering 'third_party/XNNPACK' 2025-12-04T12:52:52.6318277Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6371137Z Entering 'third_party/aiter' 2025-12-04T12:52:52.6411913Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6452534Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T12:52:52.6492138Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6540074Z Entering 'third_party/benchmark' 2025-12-04T12:52:52.6577795Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6614781Z Entering 'third_party/composable_kernel' 2025-12-04T12:52:52.6653127Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6701457Z Entering 'third_party/cpp-httplib' 2025-12-04T12:52:52.6740922Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6776825Z Entering 'third_party/cpuinfo' 2025-12-04T12:52:52.6815567Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6854289Z Entering 'third_party/cudnn_frontend' 2025-12-04T12:52:52.6896669Z http.https://github.com/.extraheader 2025-12-04T12:52:52.6933655Z Entering 'third_party/cutlass' 2025-12-04T12:52:52.6972520Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7016421Z Entering 'third_party/fbgemm' 2025-12-04T12:52:52.7055386Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7098083Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T12:52:52.7136793Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7175088Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T12:52:52.7214501Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7262124Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T12:52:52.7301039Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7337121Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T12:52:52.7375203Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7420830Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T12:52:52.7459636Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7496071Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T12:52:52.7534707Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7574214Z Entering 'third_party/fbgemm/external/json' 2025-12-04T12:52:52.7612796Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7657541Z Entering 'third_party/flash-attention' 2025-12-04T12:52:52.7699021Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7735471Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T12:52:52.7774959Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7817240Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T12:52:52.7855275Z http.https://github.com/.extraheader 2025-12-04T12:52:52.7913118Z Entering 'third_party/flatbuffers' 2025-12-04T12:52:52.7953021Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8001052Z Entering 'third_party/fmt' 2025-12-04T12:52:52.8040989Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8077273Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T12:52:52.8118156Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8152918Z Entering 'third_party/gloo' 2025-12-04T12:52:52.8193766Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8230542Z Entering 'third_party/googletest' 2025-12-04T12:52:52.8270899Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8308658Z Entering 'third_party/ideep' 2025-12-04T12:52:52.8349474Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8393230Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T12:52:52.8431669Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8482553Z Entering 'third_party/ittapi' 2025-12-04T12:52:52.8522999Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8558111Z Entering 'third_party/kineto' 2025-12-04T12:52:52.8598906Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8635296Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T12:52:52.8673060Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8709992Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T12:52:52.8749598Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8794648Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T12:52:52.8832672Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8873217Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T12:52:52.8913341Z http.https://github.com/.extraheader 2025-12-04T12:52:52.8954091Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T12:52:52.8993439Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9034369Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T12:52:52.9072748Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9112783Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T12:52:52.9151281Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9190328Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T12:52:52.9231325Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9271117Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T12:52:52.9311351Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9350955Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T12:52:52.9390834Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9434468Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T12:52:52.9472988Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9513345Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:52:52.9551867Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9592609Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:52:52.9630175Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9671613Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T12:52:52.9711390Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9749075Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T12:52:52.9785994Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9822668Z Entering 'third_party/kleidiai' 2025-12-04T12:52:52.9861431Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9900035Z Entering 'third_party/mimalloc' 2025-12-04T12:52:52.9939029Z http.https://github.com/.extraheader 2025-12-04T12:52:52.9975137Z Entering 'third_party/nlohmann' 2025-12-04T12:52:53.0014490Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0053943Z Entering 'third_party/onnx' 2025-12-04T12:52:53.0094249Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0149674Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T12:52:53.0186487Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0225155Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T12:52:53.0265801Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0303891Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T12:52:53.0341425Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0376712Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T12:52:53.0414937Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0454485Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T12:52:53.0492849Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0530878Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T12:52:53.0569910Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0606669Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T12:52:53.0645488Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0680519Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T12:52:53.0720464Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0755257Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T12:52:53.0795699Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0831645Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:52:53.0869982Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0911067Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:52:53.0950227Z http.https://github.com/.extraheader 2025-12-04T12:52:53.0987470Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T12:52:53.1024903Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1095159Z Entering 'third_party/pocketfft' 2025-12-04T12:52:53.1134102Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1174319Z Entering 'third_party/protobuf' 2025-12-04T12:52:53.1214509Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1253692Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T12:52:53.1290986Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1332026Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T12:52:53.1370385Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1408590Z Entering 'third_party/psimd' 2025-12-04T12:52:53.1448284Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1483123Z Entering 'third_party/pthreadpool' 2025-12-04T12:52:53.1524087Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1560093Z Entering 'third_party/pybind11' 2025-12-04T12:52:53.1600841Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1636366Z Entering 'third_party/python-peachpy' 2025-12-04T12:52:53.1676003Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1713508Z Entering 'third_party/sleef' 2025-12-04T12:52:53.1753450Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1790020Z Entering 'third_party/tensorpipe' 2025-12-04T12:52:53.1833847Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1871272Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T12:52:53.1910404Z http.https://github.com/.extraheader 2025-12-04T12:52:53.1954235Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T12:52:53.1993980Z http.https://github.com/.extraheader 2025-12-04T12:52:53.2033298Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T12:52:53.2070886Z http.https://github.com/.extraheader 2025-12-04T12:52:53.2105377Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T12:52:53.2147074Z http.https://github.com/.extraheader 2025-12-04T12:52:53.2181223Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T12:52:53.2219133Z http.https://github.com/.extraheader 2025-12-04T12:52:53.2291498Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.2322697Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T12:52:53.2648182Z Entering 'android/libs/fbjni' 2025-12-04T12:52:53.2675860Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T12:52:53.2694686Z Entering 'third_party/FP16' 2025-12-04T12:52:53.2722283Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T12:52:53.2737842Z Entering 'third_party/FXdiv' 2025-12-04T12:52:53.2765552Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T12:52:53.2782721Z Entering 'third_party/NNPACK' 2025-12-04T12:52:53.2810716Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T12:52:53.2829775Z Entering 'third_party/NVTX' 2025-12-04T12:52:53.2855275Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T12:52:53.2874821Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T12:52:53.2901092Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T12:52:53.2920270Z Entering 'third_party/XNNPACK' 2025-12-04T12:52:53.2948646Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T12:52:53.2988197Z Entering 'third_party/aiter' 2025-12-04T12:52:53.3014911Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T12:52:53.3033591Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T12:52:53.3056596Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T12:52:53.3084334Z Entering 'third_party/benchmark' 2025-12-04T12:52:53.3114651Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T12:52:53.3133074Z Entering 'third_party/composable_kernel' 2025-12-04T12:52:53.3160391Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T12:52:53.3186823Z Entering 'third_party/cpp-httplib' 2025-12-04T12:52:53.3214566Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T12:52:53.3232379Z Entering 'third_party/cpuinfo' 2025-12-04T12:52:53.3256514Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T12:52:53.3277061Z Entering 'third_party/cudnn_frontend' 2025-12-04T12:52:53.3304073Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T12:52:53.3327736Z Entering 'third_party/cutlass' 2025-12-04T12:52:53.3354220Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T12:52:53.3381802Z Entering 'third_party/fbgemm' 2025-12-04T12:52:53.3410118Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T12:52:53.3431481Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T12:52:53.3455757Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T12:52:53.3472741Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T12:52:53.3498347Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T12:52:53.3526996Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T12:52:53.3554381Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T12:52:53.3571816Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T12:52:53.3598551Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T12:52:53.3622756Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T12:52:53.3649712Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T12:52:53.3667729Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T12:52:53.3695518Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T12:52:53.3712710Z Entering 'third_party/fbgemm/external/json' 2025-12-04T12:52:53.3736701Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T12:52:53.3759412Z Entering 'third_party/flash-attention' 2025-12-04T12:52:53.3785048Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T12:52:53.3804896Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T12:52:53.3830030Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T12:52:53.3854894Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T12:52:53.3880595Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T12:52:53.3910472Z Entering 'third_party/flatbuffers' 2025-12-04T12:52:53.3935393Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T12:52:53.3957583Z Entering 'third_party/fmt' 2025-12-04T12:52:53.3983612Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T12:52:53.4002534Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T12:52:53.4030208Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T12:52:53.4048185Z Entering 'third_party/gloo' 2025-12-04T12:52:53.4075328Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T12:52:53.4095201Z Entering 'third_party/googletest' 2025-12-04T12:52:53.4126201Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:52:53.4142439Z Entering 'third_party/ideep' 2025-12-04T12:52:53.4170160Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T12:52:53.4185741Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T12:52:53.4212630Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T12:52:53.4241534Z Entering 'third_party/ittapi' 2025-12-04T12:52:53.4269729Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T12:52:53.4285780Z Entering 'third_party/kineto' 2025-12-04T12:52:53.4315707Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T12:52:53.4334222Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T12:52:53.4360028Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T12:52:53.4375438Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T12:52:53.4402917Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T12:52:53.4419720Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T12:52:53.4447058Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T12:52:53.4462684Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T12:52:53.4489304Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T12:52:53.4508993Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T12:52:53.4534596Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T12:52:53.4552097Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T12:52:53.4576513Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T12:52:53.4598543Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T12:52:53.4624077Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T12:52:53.4642625Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T12:52:53.4669751Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:52:53.4685725Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T12:52:53.4714590Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T12:52:53.4733538Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T12:52:53.4759893Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T12:52:53.4775545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T12:52:53.4803173Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T12:52:53.4817408Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:52:53.4845107Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T12:52:53.4862889Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:52:53.4890072Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T12:52:53.4913715Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T12:52:53.4937697Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T12:52:53.4956458Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T12:52:53.4980412Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T12:52:53.5003297Z Entering 'third_party/kleidiai' 2025-12-04T12:52:53.5037456Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T12:52:53.5055030Z Entering 'third_party/mimalloc' 2025-12-04T12:52:53.5080922Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T12:52:53.5097655Z Entering 'third_party/nlohmann' 2025-12-04T12:52:53.5125273Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T12:52:53.5143170Z Entering 'third_party/onnx' 2025-12-04T12:52:53.5170632Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T12:52:53.5209159Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T12:52:53.5235734Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T12:52:53.5255401Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T12:52:53.5281056Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T12:52:53.5299212Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T12:52:53.5326468Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T12:52:53.5342036Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T12:52:53.5369203Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:52:53.5384851Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T12:52:53.5411791Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T12:52:53.5429842Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T12:52:53.5455271Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T12:52:53.5474579Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T12:52:53.5500141Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T12:52:53.5518868Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T12:52:53.5543102Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T12:52:53.5561166Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T12:52:53.5585161Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T12:52:53.5603124Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:52:53.5629963Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T12:52:53.5650681Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:52:53.5676539Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T12:52:53.5696057Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T12:52:53.5722761Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T12:52:53.5765489Z Entering 'third_party/pocketfft' 2025-12-04T12:52:53.5795580Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T12:52:53.5813251Z Entering 'third_party/protobuf' 2025-12-04T12:52:53.5840337Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T12:52:53.5859183Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T12:52:53.5884751Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T12:52:53.5901912Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T12:52:53.5928881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:52:53.5949278Z Entering 'third_party/psimd' 2025-12-04T12:52:53.5975511Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T12:52:53.5994167Z Entering 'third_party/pthreadpool' 2025-12-04T12:52:53.6018345Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T12:52:53.6037204Z Entering 'third_party/pybind11' 2025-12-04T12:52:53.6062218Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T12:52:53.6080550Z Entering 'third_party/python-peachpy' 2025-12-04T12:52:53.6109379Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T12:52:53.6127335Z Entering 'third_party/sleef' 2025-12-04T12:52:53.6154805Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T12:52:53.6172826Z Entering 'third_party/tensorpipe' 2025-12-04T12:52:53.6201191Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T12:52:53.6216893Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T12:52:53.6243618Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:52:53.6258942Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T12:52:53.6285350Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T12:52:53.6303611Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T12:52:53.6330651Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T12:52:53.6350094Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T12:52:53.6375910Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T12:52:53.6392885Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T12:52:53.6416271Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T12:52:53.6456141Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6484453Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6510999Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6537097Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6562270Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6589742Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6617112Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6641822Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6668450Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6695832Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6721757Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6748868Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6774793Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6801864Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6828388Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6854371Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6880168Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6905640Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6930574Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6956476Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.6982530Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7012163Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7037592Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7062143Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7085951Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7111182Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7136341Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7161081Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7188539Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7213156Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7237976Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7263150Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7289015Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7316043Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7341526Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7366228Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7393263Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7419207Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7443858Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7471111Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7497715Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7528410Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7555962Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7581909Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7607818Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7634457Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7660940Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7685861Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7711234Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7737739Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7761950Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7787732Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7813449Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7839073Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7863461Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7887846Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7914596Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7939835Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7963785Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.7989630Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8016098Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8040254Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8065726Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8090709Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8122993Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8149942Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8175736Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8201314Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8228078Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8255527Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8280979Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8305821Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8331578Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8357566Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8383141Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8408109Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8435956Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8461120Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8486589Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8512950Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8537817Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:52:53.8648174Z A job completed hook has been configured by the self-hosted runner administrator 2025-12-04T12:52:53.8663993Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-12-04T12:52:53.8669637Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:52:53.8670070Z ##[endgroup] 2025-12-04T12:52:53.8756853Z [!ALERT!] Swap in detected! [!ALERT!] 2025-12-04T12:53:04.9275418Z [!ALERT!] Swap out detected [!ALERT!] 2025-12-04T12:53:23.4351902Z Cleaning up orphan processes